Series on Integrated Circuits and Systems
Liesbet Van der Perre Antoine Dejonghe
•
Jan Craninckx
Green Software Defined Radios Enabling seamless connectivity while saving on hardware and energy
ABC
Liesbet Van der Perre IMEC VZW Kapeldreef 75 3001 Leuven Heverlee Belgium
[email protected]
Antoine Dejonghe IMEC VZW Kapeldreef 75 3001 Leuven Heverlee Belgium
[email protected]
Jan Craninckx IMEC VZW Kapeldreef 75 3001 Leuven Heverlee Belgium Jan.Craninckx@ imec.be
ISBN 978-1-4020-8210-8
e-ISBN 978-1-4020-8212-2
Library of Congress Control Number: 2008937503 c 2009 Springer Science+Business Media B.V. ° All Rights Reserved No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
Preface
Green Software Defined Radios, the title of this book may have originated from a lack of inspiration, and the combination of hard work, jet lag, and drinking green tea. The message we want to convey however, is that SDRs are a promising technology for the future, providing they are designed for efficient usage of scarce resources: energy and spectrum. In the last years, the R&D teams focusing on wireless communication (around the world and at IMEC specifically), have realized great breakthroughs. It is our honor, building on this knowledge, to bring a comprehensive overview of the essential technologies. We are grateful that Springer is willing to publish in their collection on radio technologies, a book on green SDRs, a weird species still today, yet maybe the baseline for the day after tomorrow. Dear reader, we wish that you find in the following pages, including the references, some interesting insights, and that this book may live more or less up to your expectations (and hopefully more than less). This book’s closing states that the quest for Green SDRs has not ended, this is just the beginning. Concerning this book however, we are happy that today the opposite is true. We want to acknowledge our colleagues at IMEC for their great scientific contribution, and even more for the enjoyable cooperation. A¨ıssa, Amir, Andr´e, Andy, Andy, Bjorn, Bjorn, Bruno, Boris, Carolina, Charlotte, Claude, David, Dries, Eduardo, Erik, Filip, Frederik, Geert, Gert, Gregory, Hans, Jeroen, Jonathan, Joris, Julien, Lieven, Luc, Maciej, Mark, Martin, Michael, Michael, Michael, Miguel, Min, Mingxu, Noman, Osman, Peter, Pierluigi, Piet, Rodolfo, Roeland, S´ebastien, Sofie, Stefaan, Steven, Thierry, Thomas, Tom, Tom, Tong, Val´ery, Veerle, Vito: hartelijk dank! Leuven, Belgium July 2008
Liesbet Van der Perre Jan Craninckx Antoine Dejonghe
v
Contents
1
2
The Wireless and Technology Scene: Trends Asking for Green Software Defined Radio Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Chronicle of an Innovative Encounter: When Wireless Communication and Micro-electronics Meet . . . . . . . . . . . . . . . . . . . . 1.1.1 The Pioneers’ Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Digital Revolution: When Wireless Communication and Microelectronics Meet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 A Bright Future Ahead? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Wireless Scene: Heterogeneity Desires Flexibility . . . . . . . . . . . 1.2.1 Wireless Standards: The Variety is Large, and Growing . . . . 1.2.2 Wireless Terminals go Multi-mode: A Market Perspective . . 1.2.3 Multi-mode Handsets: Enabling Seamless Connectivity . . . . 1.3 The Technology Scene: Cost Imposes Reconfigurability . . . . . . . . . . 1.3.1 Scaling Pleads for Multi-purpose Devices . . . . . . . . . . . . . . . . 1.3.2 Multi-mode Terminals Ask for Hardware Reuse . . . . . . . . . . 1.4 Uniting Wireless Wishes with Technological Constraints: The Power and Spectral Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Towards Green Software Defined Radios . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Defined Radios: Enabling Seamless Connectivity for Handheld Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Flexible Radios: Species and their Territorium . . . . . . . . . . . . . . . . . . 2.1.1 Flexibility in the Wireless World . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Ancestors: Dedicated Radios . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Software Radios: A Designer’s Ultimate Nightmare . . . . . . . 2.1.4 Software Defined Radios: Addressing the Dilemma . . . . . . . 2.1.5 A Debatable Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.6 SW: Brains for SDRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.7 Adaptive Radios: How they (Do Not) Behave . . . . . . . . . . . . .
1 1 1 2 2 3 3 4 5 8 8 10 11 11 12 13 15 15 15 16 17 17 18 18 19
vii
viii
Contents
2.1.8 2.1.9
Multi-modal/Multi-standard Terminals . . . . . . . . . . . . . . . . . . From Flexible Radio to Seamless Services: Standardization Initiatives Paving the Way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Towards Green SDRs: A Holistic Approach . . . . . . . . . . . . . . . . . . . . 2.2.1 Low Power: A Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Wireless Communication Scenes: Dynamics are Everywhere 2.2.3 SDR Solutions: Scalability should be Everywhere . . . . . . . . . 2.2.4 Exploit Dynamics and Scalability! . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20 21 22 22 23 24 24 25
3
Software-Defined Radio Front-Ends: Scalable Waves in the Air . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 System-Level Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Wideband LO Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3–5 GHz Voltage-Controlled Oscillator . . . . . . . . . . . . . . . . . . 3.3.2 0.1–6 GHz Quadrature Generation . . . . . . . . . . . . . . . . . . . . . . 3.4 Receiver Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 MEMS-Enabled Dual-Band Low-Noise Amplifier . . . . . . . . . 3.4.2 Wideband Low-Noise Amplifiers . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Wideband Downconversion Mixer . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Flexible Baseband Analog Circuits . . . . . . . . . . . . . . . . . . . . . 3.4.5 Analog-to-Digital Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Transmitter Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Calibration Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Quadrature Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 DC-Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.3 Impact of LPF Spectral Behavior . . . . . . . . . . . . . . . . . . . . . . . 3.7 Full SDR Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 27 28 30 30 37 39 40 41 45 46 51 54 57 57 58 58 59 61 62
4
SDR Baseband Platforms: Opportunism to Combine Flexibility and Low Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 SDR Baseband Platforms: Going Mobile . . . . . . . . . . . . . . . . . . . . . . . 4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Opportunistic Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Low Power Operation: Sleeping, Waking, and Working on Minimal Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 SDR Platform Implementation: Teaming up with Deep-Submicron Technology . . . . . . . . . . . . . . . . . . . . . . 4.3 Digital Front-End: Going Reactive and Cognitive . . . . . . . . . . . . . . . . 4.3.1 The Global DFE: Speaking and Listening Means for the Baseband Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Zooming in on the Power Detection and AGC Controller . . . 4.3.3 Zooming in on the Synchronization Engine . . . . . . . . . . . . . .
65 65 66 66 68 70 74 74 76 76
Contents
Processors for SDR-Baseband: Working Horses in a Race for Speed and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 The Quest for High Performance and Low Power: Introducing Different Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Tuning an ADRES Processor: A Suitable Case . . . . . . . . . . . 4.5 Outer Modem Engine: Going with the Flexibility Stream . . . . . . . . . 4.5.1 Problems with Dedicated Solutions Arising . . . . . . . . . . . . . . 4.5.2 Flexible Solutions in Sight . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 SDR Baseband Platforms: Going Mobile Today . . . . . . . . . . . 4.6.2 The Future: Next Generations Desired . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
4.4
80 80 81 88 88 89 93 93 93 94
5
Software: Fuel for Green Radios: The Blessing and the Curse . . . . . . . . 97 5.1 The Blessing and the Curse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Structured SW Design: Going for Network and Platform Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Platform-Level SW: The Control Room for the SDR . . . . . . . . . . . . . 100 5.3.1 The Strategic Plan: Design Flow . . . . . . . . . . . . . . . . . . . . . . . 100 5.3.2 Meeting the Design Goals: Latency Requirements . . . . . . . . . 100 5.4 Baseband Processor SW: The Working Horse for the SDR . . . . . . . . 101 5.4.1 The Strategic Plan: Design Flow . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.2 Meeting the Design Goals: Real-Time Requirements . . . . . . 103 5.5 System Level SW: Providing SDR Terminals with Social Skills . . . . 105 5.5.1 The Strategic Plan: A Simulation Framework for Network Centric SW Development and Validation . . . . . . . . . . . . . . . . 105 5.5.2 Meeting the Design Goals: The 802.11n Case . . . . . . . . . . . . 107 5.6 Future Challenges and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.6.1 The Wireless Race for More: Trouble Ahead . . . . . . . . . . . . . 109 5.6.2 Solutions to Boost Performance: More Parallelism in the SW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.6.3 Solutions to Save Power: Architecture-Aware Scalable SW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6
Energy-Aware Cross-Layer Radio Management: Exploit Flexibility for Saving Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2 SDR Design Step: Enable Flexibility and Energy Scalability . . . . . . 118 6.2.1 Reconfigurable Analog Front-End . . . . . . . . . . . . . . . . . . . . . . 118 6.2.2 SDR Digital Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3.1 State-of-the-Art Energy Management Techniques . . . . . . . . . 122
x
Contents
6.3.2 Cross-Layer Performance-Energy Optimization . . . . . . . . . . . 123 6.3.3 Instantiation in a Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7
Towards Cognitive Radios: Getting the Best Out of the Radio and the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1.1 The Need for Reconfigurable Radio Platforms . . . . . . . . . . . . 135 7.1.2 The Need for Intelligent and Adaptive Radio . . . . . . . . . . . . . 136 7.2 New Control Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.2.1 Cognitive Radio: Broad View . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.2.2 Cognitive Radio: Spectrum-centric View . . . . . . . . . . . . . . . . . 140 7.3 New Sensing Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.4 New Radio Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8
Close: This is not the End, it’s Just a Beginning . . . . . . . . . . . . . . . . . . . . 153 8.1 A Last Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.2 Major Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.3 Challenges Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.3.1 Scaling to Next Generation Applications and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.3.2 Focus on Multi-band Antenna Interface Challenge . . . . . . . . 154 8.4 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Chapter 1
The Wireless and Technology Scene Trends Asking for Green Software Defined Radio Solutions
1.1 Chronicle of an Innovative Encounter: When Wireless Communication and Micro-electronics Meet A radio is defined as ‘a communication system employing wireless transmission of information by means of electromagnetic waves propagated through space’. Radio became reality thanks to brilliant theorists and creative experimentalists. Yet, the encounter with micro-electronics brought the impressive boost we witness today, and plays a crucial role in the evolution towards Software Defined Radios. In the next paragraphs, a brief history rolling into the future is outlined.
1.1.1 The Pioneers’ Era James Clerk Maxwell in 1864 wrote his paper ‘A Dynamical Theory of the Electromagnetic Field’ and derived the famous equations named after him. Little could he know what would be the future impact of his findings! In 1887, Heinrich Hertz was able to confirm Maxwell’s thoughts with impressive successful experiments: the propagation of electromagnetic waves in free space was proven. This milestone launched the quest for possibilities to ‘stick’ information on radio waves, and thus ‘tele-communicate’ without the need to install wires and be restricted by them. Guglielmo Marconi was the first to obtain a patent in the field of wireless communication in 1896, the year often mentioned for the invention of radio, and he later received also a Nobel Prize for his work. The first transmission of voice and music via radio-waves, on Christmas evening 1906 by Reginald Fessenden, gave birth to radio broadcasting. Importantly, this breakthrough offered the technology a human voice and face, raising its interest to a very large public.
L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
1
2
1 The Wireless and Technology Scene
1.1.2 The Digital Revolution: When Wireless Communication and Microelectronics Meet A dramatic increase and improvement of wireless communication took place since, towards the current so-called ‘third generation’ (3G) [1] and WiFi [2] multimedia communication devices that allow to exchange voice, image and data information at speeds of respectively a couple to over 50 megabits per second (Mb/s). This evolution in communications has been particularly accelerated in the second half of the twentieth century thanks to the independent evolution of two fields of science: information theory and microelectronics. The year 1948 is considered as the major landmark in the development of digital technology due to achievements in both fields that would be eventually successfully combined. Claude Shannon on the information theory side set the very founding stone of this science with the definition of one binary unit of information, or bit, as well as the definition of channel capacity. In the same year, William Shockley and his team at Bell laboratories announced the invention of the transistor, which would be used later as the building element of circuits able to process and store bits.1
1.1.3 A Bright Future Ahead? Today, the combination of mobility and connectivity has become a commodity and an essential comfort in today’s society. Remarkably, even 70 year old people declare not to be able to miss this technology anymore, while they lived most of their life without it. The variety of wireless standards, supporting a large range of services in different geographical environments and regions, is large and growing. Technology is become ever smaller and faster. These two evolutions, explained in the following sections, together ask for Software Defined Radio (SDR) solutions. The gigantic success of wireless communication is however forming a threat to itself. The analogy can be made: the convenience and popularity of car transportation has lead to enormous traffic jams hampering mobility, and unsustainable energy consumption. Similarly, the spectrum gets crowded, and increased service requirements are draining mobile devices’ batteries.
1
This parallel between Shannon’s definition of a bit and the transistor was based on Berrou’s and Glavieux’s introduction to their IEEE Information Theory invited paper of 1998, on the occasion of the 50th birthday of the transistor and of information theory. The whole article can be found in http://www.itsoc.org/review/frrev.html.
1.2 The Wireless Scene: Heterogeneity Desires Flexibility
3
1.2 The Wireless Scene: Heterogeneity Desires Flexibility 1.2.1 Wireless Standards: The Variety is Large, and Growing Wireless communications are routinely used today for a large variety of applications including voice, data transfer, Internet access, audio and video streaming, just to name a few. For one specific service, several systems are standardized, and they can each become the preferred option in several regions of the world. Table 1.1 illustrates this for digital broadcasting systems. Just to support this service on a terminal, already some degree of flexibility in the radios is needed. Pushed by the insatiable demand for bandwidth and pulled by the steady improvement of semiconductor technology (Moore’s law), the performance offered by wireless standards is due to improve with the years, seemingly without bounds. This is illustrated in Fig. 1.1 which shows the observed and predicted evolution of the main classes of wireless access standards as thick arrows (from bottom to top): Wireless Personal Area Networks (WPAN), Wireless Local Area Networks (WLAN), Wireless Metropolitan Area Networks (WMAN), and Wireless Wide Area or cellular (WWAN) networks. Obviously, higher rates will be more easily – or sooner – achieved at low mobility than at higher mobility. Table 1.1 Various standards for digital video broadcasting
4
1 The Wireless and Technology Scene 1995
High speed
2000
2G (digital)
2005
2010
3G+
3G Multimedia
GPRS UMTS EDGE CDMA2000 GSM CDMAone
3GPPLTE+
802.16e
4G
Medium speed
research target
1G (analog) WIMAX
Low speed/ Stationary
2.4 GHz WLAN
5 GHz WLAN
UWB WPAN
Bluetooth
10 kbps 100 kbps 1 Mbps
High rate WLAN
10 Mbps
100 Mbps
60 GHz WPAN
1 Gbps
Fig. 1.1 Variety of wireless access standards
1.2.2 Wireless Terminals go Multi-mode: A Market Perspective From a user perspective, it is very attractive to have a single handheld device that can support a large variety of wireless standards. In an answer to the user’s demand, mobile handsets have started supporting multiple modes over the past years. In a first instance, this is achieved by integrating multiple radios into one handset. A nice example is shown in Fig. 1.2 [2]. Clearly, when the number of radios increases, the cost, size, and weight of the terminal are seriously affected by the multi mode extension. A much more efficient solution is clearly offered by radios which you can reconfigure (such as e.g. Software Defined Radios) to access several wireless standards. This is further explained in Section 3.2. Market forecasts [1] predict that the partition of ‘SDR enabled’ mobile handset shipments will grow considerably in the coming years, really ‘taking off’ probably in 2010 (see Fig. 1.3). Importantly, going from dedicated radios to reconfigurable radios, brings about a real paradigm shift! Consequently, manufacturers only take the leap when significant advantage is expected. Maybe they should even ‘feel the pain’ first. In NorthAmerica, more dispersion in cellular standard adoption occurs, with both CDMA and GSM systems being widely used. As a result, the proportion of SDR enabled mobile shipments, is increasing much faster in this region, as shown in Fig. 1.4 [1]. For all clarity and from the point of completeness, the global distribution of absolute numbers (optimistic case) SDR enabled handset shipments is given in Fig. 1.5 [1]. Clearly, as for dedicated radios, the largest market in terms of volume is situated in the Asia Pacific region.
1.2 The Wireless Scene: Heterogeneity Desires Flexibility 1
2
3
RF ID
FM
5 4 5
6
7
Antennas
UWB
WLAN
Blue tooth
8
2G/3G Cellular
DVB-H
9
10
diversity RX
GPS
11
Fig. 1.2 Multi-mode handset featuring separate radios (Nokia [2])
12
Optimistic
Pessimistic
% of total shipments
10
8
“SDR enabled” = handset with programmable baseband processor
6
4
2
0 2007
2008
2009
2010
2011
Fig. 1.3 SDR enabled handset shipments 2007–2011 worldwide [1]
1.2.3 Multi-mode Handsets: Enabling Seamless Connectivity Ubiquitous and seamless connectivity can be achieved in a heterogeneous network environment, under the condition that both terminals and network enable feature the necessary reconfiguration capabilities to support horizontal (between access point adhering to one standard) and vertical (between access points operating different standards) roaming. Recently, the need for reconfiguration support is receiving
6
1 The Wireless and Technology Scene 30
Optimistic
Pessimistic
% of total shipments
25
20
15
10
5
0 2007
2008
2009
2010
2011
Fig. 1.4 SDR enabled handset shipment in North-America 2007–2011 ([1])
100%
80%
60%
40%
20%
0% 2009 - Optimistic
2010 - Optimistic
2011 - Optimistic
Fig. 1.5 Global breakdown of SDR enabled mobile shipment 2007–2011 [1]
attention in specific standardization initiatives, as illustrated in Fig. 1.6 (from IEEE SCC41 [4], formerly IEEE 1900). This confirms that technological answers are due to answer the user’s need for seamless connectivity. These technological answers need to include:
1.2 The Wireless Scene: Heterogeneity Desires Flexibility
7
Fig. 1.6 Need for reconfiguration support to enable ubiquitous and seamless connectivity
Multi-mode radios: user terminals should encompass radios that can generate the different waveforms as specified in the various standards. One solution to generate these various waveforms is to conceive flexible radios. In Chapter 2 we will position Software Defined Radios as a solution to realize low power flexible radios. Control solutions for reconfiguration: Control solutions for flexible radios should assure reliable connectivity, allocating the available resources in the most efficient way. In this book, we express that this challenge should be considered and optimized across the different OSI-layer, and we propose an approach to realize cross-layer (XL) control solutions. On top of to seamless connectivity to one service, users want to enjoy a multitude of services on one terminal: Streaming interactive connectivity (voice and video), high-speed data access, broadcasting reception, and short range connectivity to devices in the close proximity. Predictions claim that handheld devices will need to support at least six different radios already in 2009. In conclusion from a functionality point of view, flexibility is desired! A closer inspection of the communication schemes (see Table 1.2 for a summary of the major specifications for key wireless standards) reveals that this concept is quite challenging. Indeed, the variety of bit rates, modulation formats, physical bandwidths and carrier frequencies, is large. Fortunately, we see some common trends in broadband access schemes, which enable to optimize flexibility in the radios. For example, modulation schemes applying frequency domain processing are recurrently used
8
1 The Wireless and Technology Scene
Table 1.2 Major characteristics of wireless access standards
for achieving high rates in fading environments. Also, the use of multiple antennas processing, in its most advanced flavor ‘Multiple Input Multiple Output’ (MIMO), is becoming commonplace. The assessment of and the impact on SDR front-end and baseband platform, will be given in the relevant chapters further in this book.
1.3 The Technology Scene: Cost Imposes Reconfigurability 1.3.1 Scaling Pleads for Multi-purpose Devices In parallel to the increasing data rates and need for functional flexibility in radio systems, progress in CMOS technology also has witnessed an impressive evolution over the last decades. Gordon E. Moore predicted correctly 40 years ago, that the number of transistors on a chip would double about every 2 years. He added “. . . (T)he first microprocessor only had 22 hundred transistors. We are looking at something a million times that complex in the next generations-a billion transistors. What that gives us in the way of flexibility to design products is phenomenal.” This evolution is actualized and detailed in the International Technology Roadmap for Semiconductors (ITRS roadmap) [3]. Indeed, transistors can be made ever smaller
1.3 The Technology Scene: Cost Imposes Reconfigurability Fig. 1.7 CMOS scaling makes chips ever faster
9 1400 GHz
ITRS 2003 fT
150 GHz
45 nm 18 nm
100 nm
$1M
Mask cost
$400k
350 nm
250 nm
180 nm
130 nm 90 nm
65 nm
45 nm
Fig. 1.8 Mask costs are increasing exponentially
and faster (see Fig. 1.7). The scaling has brought enormous processing capabilities in small areas, opening opportunities to implement flexible platforms at low cost and low power. For the last decades, the ‘happy scaling’ has offered us more functionality and at the same time lower costs. Yet, for the newer technology nodes, the Non-RecurringEngineering (NRE) costs related to system-on-chip (SoC) design are rising exponentially. This is illustrated for the mask cost in Fig. 1.8. Next to the mask cost, the design cost has increased dramatically, and is expected to do so. Not only is the complexity of the designs increasing, moreover CMOS scaling has arrived at the point where parasitic problems are becoming dominant: variability, reliability, and last but not least leakage. These effects can not be resolved anymore at the technology (transistor) only, and have to be taken care off in the design phase as far as possible. The ‘problem table’ in Table 1.3 indicates which parasitic effects are expected to impact analog and digital design for 45 nm and smaller technologies. Today, the question is often asked whether for cost reasons, scaling is still preferred. Till now however, it seems that the rising NRE costs have been compensated by the fact that ever higher volumes are produced. For mass markets, such as for
10
1 The Wireless and Technology Scene
Table 1.3 CMOS scaling parasitic problems for analog and digital design Problems for 45 nm and beyond Increase of gate leakage current Increase of subthreshold leakage currents Increase of variability Decrease of voltage gain Degradation 1/f noise by high -k gate dielectrics Increase of switching noise Increase of Miller capacitance Increase of transistor series resistance VDD decrease reliability Inductors consume much expensive area Fig. 1.9 Scaling brings cost advantage for high-volume products
For digital
For analog RF
X X X
X X X X X X X X X X
X X X X
$/chip
Not you!
Node x Node x+1
volume
mobile devices, scaling from technology node ‘x’ to the next one ‘x + 1’ is still a must to be competitive (see Fig. 1.9). Of course, this pleads for multi-purpose devices. Already today in specific cases, cost trade-offs show a cost advantage in using a reconfigurable radio for singlemode devices as well: the extra area penalty is there not significant compared to the NRE. ‘Flexible radio extremists’ will even claim that on the (extreme?) longer term, no dedicated radios will survive.
1.3.2 Multi-mode Terminals Ask for Hardware Reuse For multi-mode terminals, clearly the possibility to reuse silicon and thus significantly reduce overall area makes the argument to go for flexible platforms much stronger still. Moreover, other cost factors also direct towards hardware reuse: Number of components and assembly cost: Some visions claim that ‘all the cost will be in the assembly’. While current terminal breakdowns do not show this yet, intuitively indeed we can foresee this share of the cost will grow in the future.
1.4 Wireless Wishes with Technological Constraints: The Power and Spectral Challenge
11
Replacing multiple radios by a single flexible radio, clearly reduces the number of components and consequently the assembly cost. Form factor (size and weight): The chase for ever flatter and lighter mobile terminals is constant, while more and more features and radio interfaces need to be supported. Multi-purpose devices bring an important, in the future maybe even indispensable, advantage. Time to market: New standards, updates, regional flavors, seem to pop up every day, while designing a full new radio easily takes years. The time to market can be decreased impressively, if redesigning can be traded for reconfiguring and reprogramming. The effort hidden in the reprogramming should however definitely not be underestimated (see ‘blessing and curse’ in Chapter 5). Taking into consideration the scaling evolution and constraints for multi-mode radios, we can conclude that cost imposes reconfigurability, and will drive manufacturers towards flexible radios for future wireless terminals.
1.4 Uniting Wireless Wishes with Technological Constraints: The Power and Spectral Challenge 1.4.1 Challenges The combination of the increasing need for functional flexibility and the huge cost related to SoC design will make implementation of multi-mode radios on SDR platforms the only viable option in the future. Mobile devices being battery-powered, the performance requirements are coupled with severe constraints on energy efficiency. This is becoming a key concern: there exists a continuously growing gap between the available energy, resulting from battery technology evolution, and the steeply increasing energy requirements of emerging radio systems (see Fig. 1.10). A major challenge therefore is to enablelow energy reconfigurable radio implementations, suited for low-cost handheld multimedia terminals. They should reach battery life-time of today’s fixed hardware implementations, and at the same time offer reliable connectivity. Besides the energy constraint, spectrum is also becoming a major resource bottleneck. The spectrum is a scarce valuable resource, which has become over-allocated over the past decades. Figure 1.11 shows a sample of the allocation of the spectrum below 3 GHz, clearly illustrating the congestion. Due to the accelerated deployment of broadband personal communication and the continuously increasing demand for higher data rates, we are heading towards a red brick wall. New paradigms for efficiently exploiting the spectrum are obviously needed. A current trend is the evolution towards dynamic and open access to spectrum, motivated by the under-utilization of many licensed frequency bands. This has led to
12
1 The Wireless and Technology Scene
Fig. 1.10 The energy gap between required and available energy is growing Gap Energy requirement
Energy available in battery
Time
Fig. 1.11 Spectrum allocation snapshot: no space left
the concept of cognitive radio (CR, see Chapter 7 for a clarification of nomenclature). Cognitive radios will essentially need flexibility support in HW and SW and adequate control for reconfigurability.
1.4.2 Towards Green Software Defined Radios In this book, we present a holistic approach towards ‘Green Software Defined Radios’, enabling seamless connectivity while saving on hardware and energy. In Chapter 2, the technical content of the book is introduced. Relevant taxonomy is given and Software Defined Radios are positioned in the broader sphere of flexible radios. A holistic system approach towards low cost, low energy SDRs is presented, leveraging on the concept of providing and exploiting scalability for low energy. Radio front-end building blocks need to be designed to offer flexibility in carrier frequency, channel bandwidth, noise performance, etc. without a significant power penalty. An overview of SDR front-end challenges and solutions is given in Chapter 3. We specifically also zoom in to a scalable ADC architecture, that nicely couples scalability to a low power consumption. In Chapter 4 a survey of SDR baseband solutions is given. Special focus is on a heterogeneous Multi-Processor System-on-Chip (MPSoC) approach optimized for scalability and low energy. Particular attention is paid to the optimization of domainspecific processors optimized for wireless communications functionality.
References
13
On top of the hardware aspects, software is obviously of paramount importance when designing Software-Defined radios. In Chapter 5 solutions and approaches are presented to enable efficient design both for platform-level and processor-specific SW components. Chapter 6 addresses the topic of cross layer optimization, resulting in control solutions which exploit flexibility for low energy. A generic approach to master optimization complexity will be given. Moreover, use cases will be presented showing the impressive gains that can be achieved. The spectral challenge is the focus of Chapter 7. A preview is given on how SDRs are crucial to realize cognitive radios, and it is explained which are the specific features that will need to be added. Chapter 8 closes the book, summarizing the major contributions. Also, some open research questions are put forwards. Specifically evolutions in both the wireless and the technology are considered. Importantly we remark that, given the aggravating energy, capacity, and flexibility requirements, the quest for green SDRs will not end in the foreseeable future.
References 1. 2. 3. 4.
Software Defined Radio in Mobile Phones, ARCchart, November 2007 Yrj¨o Neuvo, CTO, Nokia at ISSCC2004 www.itrs.net www.scc41.org
Chapter 2
Software Defined Radios Enabling Seamless Connectivity for Handheld Devices
2.1 Flexible Radios: Species and their Territorium A continuous trend towards ‘more flexibility’ in wireless systems is witnessed. Before digging into the topic of Software Defined Radios, we position them in the perspective of this trend. First, flexibility in the overall wireless context is introduced. Next a (view on) taxonomy of flexible radios is given. An analysis of the software in SDRs is given. Finally, some relevant standardization initiatives are listed.
2.1.1 Flexibility in the Wireless World Flexibility is a property of radio systems and networks, and thus it does not get easily categorized within the various layers of the familiar OSI communication model (or the related IP network model). Rather, it manifests itself across the layers, and its scientific inquiry calls for a multi-layered approach, one that explores synergies rather than separations. One can distinguish between platform (or equipment)-centric flexibility, network-centric flexibility and service/applicationcentric flexibility. The network-centric flexibility addresses radio-network-architecture features, where the flexibility (potentially including reconfigurability) aspects of the platform-centric portion can be abstracted. The Service/Application-centric flexibility is more and more present in wireless services and applications. This flexibility can be nicely coupled to the radio flexibility, and synergies can be exploited (see Chapter 6). The flexible radios discussed in this book, essentially generate flexible waveforms to be sent over the air. This is achieved through reconfigurability in the wireless modem(s). Consequently, they concern relevant functionalities of the HW and L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
15
16
2 Software Defined Radios
the SW, radio- (RF) and intermediate frequency (IF) front-ends, digital Base-Band (BB) platforms, and control strategies. For the latter, a cross-layer approach is proposed to achieve the best power/performance accords. The focus is on handheld terminals, which essentially should live on (low weight) batteries, and can feature a high mobility and end up in a large variety of geographical environments. In the coming sections we shortly introduce taxonomy of flexible radios, in order to position the specific species of Software Defined Radios which form the subject of this book. This taxonomy has grown to some extent organically, and therefore is not perfectly unambiguous. The interpretation proposed in this book builds upon the classification proposed by communities of experts,1 and the authors’ personal preference.
2.1.2 Ancestors: Dedicated Radios Traditionally, radios have featured little or no flexibility, in the sense that they have been designed to be compatible to a specific standard. We call this class dedicated radios. More recently, a certain degree of flexibility in the waveform has been introduced within one standard. A clear example is the recent IEEE802.11n, where not only different constellation sizes and code rates should be supported, but also several operating frequencies and bandwidths have been defined [12]. The emerging 3GPPLTE standard [13] is envisaged to open even more flexibility opportunities to fit in a wide variety of wireless communication scenes (see definition in Section 2.2). These standards clearly require the radio to handle different parameter settings. Yet, they have till now still mostly been built for one purpose (standard and/or service), and therefore receive the ‘dedicated’ label. Two flavors of non-flexible radios are categorized: A Hardware Radio (HR) is implemented using hardware components only and cannot be modified except through physical intervention. This radio is becoming an endangered species, and even extremely low power devices more and more ask for some scalability [6]. A Software Controlled Radio (SCR) implements only the control functions in software – thus only limited functions are changeable using software. This can for example be used to control parameters of a modem. An exemplary OFDM-modem for WLAN is documented in [3].
1 Specifically, the SDR forum (www.sdrforum.org) definitions and terminology used in the WWRF forum 0 have been used as a reference. Private communication with Prof. Andreas Polydoros and discussion in the context of the NEWCOM project have also contributed significantly to throw light in the matter.
2.1 Flexible Radios: Species and their Territorium
17
2.1.3 Software Radios: A Designer’s Ultimate Nightmare One of the most far-fetched ways to implement flexible radios, is to apply the concept of Software Radio (SR) [2] that basically assumes that all the radio modules are implemented by software. These radios seem to offer the ultimate freedom many aspire [9]. The architecture is quite simple, as illustrated in Fig. 2.1: place the Analog-to-Digital Convertor (ADC) and Digital-to-Analog convertor (DAC) at the antenna, and perform all transmission and reception functionality digitally, preferably on a DSP. The clear advantage of this kind of radios is that they can easily be modified to adjust to new services and management strategies. Hence, they are each marketing manager’s dream. Yet, for the RF IC designer they are a nightmare come true. Considering the specifications of the ADC, one would need a sampling rate in the order of 10 GS/s, at a resolution of 16 bits. The power consumption of the ADC alone would in the order of 100 W, assuming only you could ever design it. This clearly does not fit in the power budget of a mobile device. As for the further processing of these samples coming in at 10 GS/s at 16 bit on a DSP, this is also a job one would not like to see happening on a handheld device.
2.1.4 Software Defined Radios: Addressing the Dilemma A Software Defined Radio (SDR) can be defined [1, 4, 8] as a collection of hardware and software technologies that enable reconfigurable system architectures for wireless networks and user terminals. SDR provides an efficient and comparatively inexpensive solution to the problem of building multi-mode, multi-band, multi-functional wireless devices that can be enhanced using software upgrades. SDR-enabled devices (e.g., handhelds) and equipment (e.g., wireless network infrastructure) can be dynamically programmed in software to reconfigure the characteristics of equipment. This demarcation could be considered as the definition of the broader class of reconfigurable radios. For SDRs specifically, we assume at least a significant part of the functionality is actually implemented in software running on processors, which can be more or less tuned for the domain or sub-functionality
ADC DSP
Fig. 2.1 Software radio concept
DAC
18
2 Software Defined Radios
(see Chapter 4). Reconfigurable radios purely composed of parametrizable HW blocks, can be built to achieve lower power consumption, at the price of offering less flexibility. SDRs thus make use of a common hardware platform, which typically consists of an analogue front-end and a digital baseband platform, incorporating processor(s) to perform part of the data transmission and reception functionality. In the partitioning, a trade-off between degree of flexibility and implementation complexity (and power) is made. Average power consumption and energy/bit are considered as the relevant metrics for which radios for handheld devices should be optimized. In this view, SDRs are in this book advocated as the best answer in the current state-of-theart to provide flexible radios for mobile devices.
2.1.5 A Debatable Overview For the purpose of making the working assumptions of the authors of this book explicit, and at the risk of controversy, an overview of the different species of flexible radios is sketched in Fig. 2.2 below.
2.1.6 SW: Brains for SDRs Reconfigurable HW platforms essentially form the muscles for SDRs. The actual wireless modem functionality is implemented by means of the SW running on the SDR platform and terminal. The SW needs to implement the functionality associated to the different layers in the OSI stack, is implemented on several levels of abstraction, and can run either on the SDR platform or on a host processor in the terminal. Figure 2.3 shows an exemplary build-up of the SW in an SDR system. The platform and processor dependent SW today typically includes all SW implementing the PHY layer, and can include drivers and SW modules which have been highly optimized towards specific processors.
Flexible Radios Reconfigurable Radios Software Defined Radios Software Radios
Fig. 2.2 Species of flexible radios, classification assumed in this book
2.1 Flexible Radios: Species and their Territorium
19
Other protocol functions & management
Embedded SW on SDR Platform
Can include common SW components *
SW on host processor
Platform control & time critical MAC Hardware Abstraction Layer Platform and processor dependent SW
Processor 1
Processor n
‘Operating environment’ ‘firmware’ *
Platform component i
SDR Hardware * Eventually downloadable
Fig. 2.3 Exemplary build-up of the SW in an SDR system
Eventually, new SW could be downloaded on terminals through the network. This definitely is envisioned for platform-independent SW protocol and management functionality. On a level much closer to the platform hardware, one could also foresee ‘downloadable waveforms’ in the future, and early attempts have been made on a simulator level [7]. This basically implies intervening in the firmware of the platform. Operators clearly would enjoy a generic waveform-downloading feature. On the mid term and for handheld devices, power constraints limit the practicality of this prospect. This book primarily focuses on the SDR platform itself, including the platform and processor-dependent SW (see Chapters 4 and 5 for more detail).
2.1.7 Adaptive Radios: How they (Do Not) Behave In the previous sections, different classes of flexible radios have been defined, all featuring the common attribute that they can ‘output’ flexible waveforms. This flexibility relates to the physical possibilities of the radios. Complementary, radios can be categorized on how they will actually behave in operation, ‘at run-time’. In their behaviour, radios can either be adaptive, or not. Figure 2.4 illustrates how the flexibility in the physical radio implementation and the intelligence of the radio (‘Intelligent Signal Processing’ [ISP]) can be considered as two different dimensions. Advances among both dimensions are essential to progress towards ultimate ‘Mitola’ [15] cognitive radios (see Chapter 7).
20
2 Software Defined Radios
flexibility Mitola radio
full SR no limits
full SR limited
SDR
SCR
basic radio
logic
analysis
intuition
ISP
Fig. 2.4 Evolution from basic radio to ‘Mitola’ radio through increasing physical radio flexibility and intelligence
Non-adaptive radios receive a value for all their parameters before starting up a communication, and keep on transmitting and receiving accordingly afterwards. One could say that they ‘do as they’re told’, they ‘behave’. Adaptive radios will not keep the settings they received at initialisation. In stead, they may change some of their parameters based on external conditions, more specifically the dynamics in the wireless communication scene. For example, a WLAN modem lowering its bit rate when the path loss it measures has increased above a certain threshold (e.g. by switching to a lower constellation size), demonstrates adaptive behaviour. There is no one-to-one relationship between flexibility and adaptability. Flexible radios can behave completely non-adaptive, in cases where they are configured before transmission/reception, and consequently stick to the same parameter set. Similarly, dedicated radios can be adaptive, as illustrated in the WLAN example above. Naturally, flexibility in the radio platform opens up wide-ranging adaptation opportunities, crossing boundaries of standards. A specific subset of adaptive radios applying learning techniques, are called cognitive radios. Taxonomy for cognitive radios also has brought about some nice confusion. The relevant terminology is treated in Chapter 7.
2.1.8 Multi-modal/Multi-standard Terminals In the trend towards seamless connectivity in a heterogeneous network environment, multi-modal/multi-standard terminals are gaining momentum on the market [5].
2.1 Flexible Radios: Species and their Territorium
21
Most of the products today comprise parallel chains of dedicated radios for different standards or modes. Recently, reconfigurable radios are entering these terminals, which support a set of modes and/or standards on a common hardware platform. As elaborated on in Chapter 1, cost and size concerns clearly favor reconfigurable radios as the preferred option for multi-modal/multi-standard terminals.
2.1.9 From Flexible Radio to Seamless Services: Standardization Initiatives Paving the Way Innovative research has recently made impressive progress towards flexible radios. In parallel, crucial questions arise on how these radios will be operated in a real network environment, where in the end users should be able to benefit from the interesting new features of these radios. Terminal manufacturers also worry about certification of these devices. Fortunately, standardization initiatives have started concentrating on the aspects of reconfigurability and interoperability.
2.1.9.1 SDRs and their Interoperability One of the expectations towards SDRs is that they will enable the development of ‘common’ or ‘open’ software modules in the future, which could be implemented on various platforms, independent of the actual terminal equipment or manufacturer. In this context, several initiatives focus on standardizing SW modules and interfaces running on the operating environment of an SDR platform, assuming the actual radio platform could be abstracted (in the future). One association active in the definition of essential interfaces in this context, is the Software Defined Radio Forum (SDR forum) [8]. The SDR forum is a nonprofit organization dedicated to promoting the development, deployment and use of software defined radio technologies for advanced wireless systems. As one of the outputs, the technical committee of the forum generates technical reports providing a description of the concepts and basic architecture for an SDR along with a description of the internal interfaces for the radio, its software download process, interfaces between various modules, and basic message definitions needed for such a process. In a security context, the Joint Tactical Radio System (JTRS) [7] is a program of the US and NATO to produce radios which provide flexible and interoperable communications. Examples of radio terminals which require support include handheld, vehicular, airborne and dismounted radios, as well as base-stations (fixed and maritime). This goal is achieved through the use of SDR systems based on an internationally endorsed open Software Communications Architecture (SCA). The Wireless World Research Forum (WWRF) has established a specific Working group on reconfigurability [10]. They also stress the importance of reference models for reconfigurable terminals and network architectures, and have published relevant white papers on the topic.
22
2 Software Defined Radios
2.1.9.2 Towards Dynamic Spectrum Access Networks Dynamic spectrum access can leverage on SDRs, and opens up a new wireless order. While this new paradigm opens up access to unrevealed capacity, along comes a whole collection of new regularization and standardization issues. SCC41 [14] encompasses standards projects in the areas of dynamic spectrum access, cognitive radio, interference management, coordination of wireless systems, advanced spectrum management, and policy languages for next generation radio systems. SCC41 is particularly interested in ideas that could be implemented in commercial products in the near to medium term. The scope of this standardization covers a wide range of aspects related to concepts and technologies in the fields of spectrum management, policy defined radio, adaptive radio, software defined radio, reconfigurable radio and networks: – WG IEEE 1900.1 aims to develop a standard which will facilitate the development of these technologies, by clarifying the terminology and how these technologies relate to each other. – WG IEEE 1900.2 analyzes effects related to coexistence and interference. – WG IEEE 1900.3 focuses on ‘Recommended Practice for Conformance Evaluation of Software Defined Radio (SDR) Software Modules’. The goal of this effort it to assure that SDR software can be deployed with high confidence that it will operate within prescribed regulatory and operational limits. The guideline will apply to wireless network operators and terminal equipment manufacturers to help them define test guidelines that conform to SDR technologies, to be licensed by regulatory authorities. – WG IEEE 1900.4 discusses Architectural Building. The standard defines the building blocks comprising network resource managers, device resource managers, and the information to be exchanged between the building blocks, for enabling coordinated network-device distributed decision making which will aid in the optimization of radio resource usage, including spectrum access control, in heterogeneous wireless access networks (see also Chapter 1). The challenges to realize the necessary standards and regulations for SDRs and cognitive radios are considerable. Yet, relevant initiatives (as described above) are gaining momentum. Major industrial players are teaming up with academia to create the context needed to enable SDRs to be exploited to their full potential.
2.2 Towards Green SDRs: A Holistic Approach 2.2.1 Low Power: A Philosophy Given the energy gap introduced in Chapter 1, a major challenge is to enable low energy reconfigurable radio implementations, suited for handheld multimedia terminals and competitive with fixed hardware implementations. Several major integrated
2.2 Towards Green SDRs: A Holistic Approach
23
device manufacturers have indicated in recent announcements that energy efficiency (i.e., MOPS/mW) is almost not improving any more by technology scaling. The attention for low power can not be localized in one stage of the design: it needs to become a philosophy. A wise team said2 : ‘Think about low power all the time, if it needs an all-time power record!’ Platform improvements and circuit design progress are essential but not sufficient for bridging the energy gap. A clear need for holistic system-level strategies exists, and disruptive solutions are needed.
2.2.2 Wireless Communication Scenes: Dynamics are Everywhere A radio has been defined as a communication system employing wireless transmission of information by means of electromagnetic waves propagated through space. Importantly, the value of a radio increases tremendously when employed in a network to deliver services. We define a ‘wireless communication scene’ as the combination of the service, the propagation conditions, and the network situation. These three different elements of the wireless communication scene all feature a high degree of dynamics. 1. The service to be delivered. Advanced mobile terminals support next to the traditional ‘voice’, multi-media services ranging from data to video, and in the future maybe even 3D multimedia (for example for mobile gaming applications). The nature of the service highly influences the amount of information to be transmitted, and the constraints (e.g. on errors and latency) to be met. Also within one service, the dynamics can high. The actual rate of a video codec for example, very much depends on the specific images, and the correlation between frames. 2. The propagation conditions on the channel. The most prominent attribute is the path loss, which will define the average attenuation the signals undergo between transmitter and receiver. Next, the multi-path response encountered due to reflections of the waves, is an important characteristic. It will define the fading the signals encounter [11]. Handheld terminals can end up in different geographical environments. Moreover, as they can be used in mobile conditions, both the path loss and the fading characteristics can vary in time during communication. 3. The network situation. The ether is a shared medium, consequently a pool of multiple users can simultaneously send signals ‘in the air’. The multi-user traffic situation can be very dynamic as well, with the number of users and their communication intensity coming and going. The prominent dynamics in the wireless communication scene can and maybe should be adapted to, as will be explained in Section 2.4.
2 Statement made by IMEC’s T@MPO team working on a low power Turbo codec [16], turns out to be even more crucial for SDR systems and newer technology nodes.
24
2 Software Defined Radios
2.2.3 SDR Solutions: Scalability should be Everywhere When building SDR solutions, in first instance, functional flexibility is targeted. However, introducing reconfigurability in the different components of the radio typically comes at the risk of a power penalty. The key to enabling both low power and flexibility, is to target energy scalable SDRs. These radios feature a power consumption which scales down accordingly, if they do not need to deliver their maximum performance. Energy scalability in SDRs can and should be introduced everywhere in the system [18]: – – – –
Both in the analogue front-end and the digital baseband On different hierarchical levels in the design: platform, components, and circuits As well in the hardware as in the algorithmic solutions and the software Considered over different standards as well as within one standard
In the conception of green SDRs [17], functional flexibility and energy scalability are considered as firmly coupled design goals.
2.2.4 Exploit Dynamics and Scalability!
t
Communication conditions
Application requirements
A major challenge is to enable low energy SDR solutions, suited for handheld multimedia terminals and competitive with fixed hardware implementations. To make low energy terminals a reality, a two-step approach is advocated (Fig. 2.5) [17].
t
Cross-layer optimized Power/performance manager
Energy - scalable SDR Baseband Baseband
Front-end
Fig. 2.5 Reconfigurable and energy scalable radio solutions achieve low energy operation through cross-layer joint QoS and energy management
References
25
Traditional designs are still mostly tuned for the worst-case. By carefully scanning and following the exact (run-time) requirements without over-dimensioning the active part of the components, much energy can be saved. First, effective energy scalability is enabled in the design of the radio baseband and front-end. Secondly, the scalability is exploited to achieve low power operation by a cross-layer controller that follows at run-time the dynamics in the wireless communication scene. The approach illustrated above sketches the fundamental concepts of this book. Cognitive radios will extend the flexibility enabled by new reconfigurable radio architectures, by adding intelligent control solutions exploiting under-utilization of many licensed frequency bands. The design approach introduced above is clearly paving the way towards this challenging goal, as further explained in Chapter 7.
References 1. M. Dillinger, K. Madani, and N. Alonistioti, Software Defined Radio: Architectures, Systems and Functions, Wiley, Chichester, 2003. 2. J. Reed, Software Radio – A Modern Approach to Radio Engineering, Prentice-Hall, Upper Saddle River, NJ, 2002. 3. W. Eberle et al., A digital 80 Mb/s OFDM transceiver IC for wireless LAN in 5 GHz band, IEEE International Solid-State Circuits Conference, San Francisco, CA, Feb. 2000. 4. F.K. Jondral, Software-Defined Radio – Basics and Evolution to Cognitive Radio, EURASIP Journal on Wireless Communications and Networking, Vol. 2003, No. 3, pp. 275–283, 2005. 5. G. Desoli and E. Filippi, An Outlook on the Evolution of Mobile Terminals, CAS Magazine, second quarter 2006. 6. A. Sinha, A. Wang, and A.P. Chandrakasan, Energy Scalable System Design, IEEE Transactions on VLSI Systems, Vol. 10, No. 2, pp. 135–145, April 2002, Transaction on VLSI Systems, April 2002. 7. sca.jpeojtrs.mil. 8. www.sdrforum.org. 9. REM, Radio free Europe, Murmur, 1983. 10. Wireless World Research Forum (WWRF), Working Group 6 (Reconfigurability) http://wg6.ww-rf.org/. 11. T.S. Rappaport, Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River, NJ, 1996. 12. www.IEEE802.org. 13. www.3gpp.com. 14. www.scc41.org. 15. J. Mitola et al., Cognitive Radio: Making Software Radios More Personal, IEEE Personal Communication, Vol. 6, No. 4, pp. 13–18, Aug. 1999. 16. B. Bougard et al., A Scalable 8.7-nJ/bit 75.6-Mb/s Parallel Concatenated Convolutional (turbo-) Codec, IEEE International Solid-State Circuits Conference, Feb. 2003, San Francisco, CA. 17. A. Dejonghe, B. Bougard, S. Pollin, J. Craninckx, A. Bourdoux, L. Van der Perre, and F. Catthoor, Green Reconfigurable Radio Systems: Creating and Managing Flexibility to Overcome Battery and Spectrum Scarcity, Signal Processing Magazine, May 2007. 18. L. Van der Perre et al., Architectures and Circuits for Software Defined Radios: Scaling and Scalability for Low Cost and Low Energy, ISSCC 2007, Feb. 2007, San Francisco, CA.
Chapter 3
Software-Defined Radio Front-Ends Scalable Waves in the Air
3.1 Introduction The ultimate dream of every Software-Defined Radio (SDR) front-end designer is to deliver an RF transceiver that can be reconfigured into every imaginable operating mode, in order to comply with the requirements of all existing and even upcoming communication standards. These include a large range of modes for cellular (2G–2.5G–3G and further), WLAN (802.11a/b/g/n), WPAN (Bluetooth, Zigbee, ...), broadcasting (DAB, DVB, DMB, ...) and positioning (GPS, Galileo) functionality. Obviously, all of them have different center frequency, channel bandwidth, noise levels, interference requirements, transmit spectral mask, etc. As a consequence, the performances of all building blocks in the transceiver must be reconfigurable over an extremely wide range, requiring ultimate creativity from the SDR designer. Reconfigurability is a requirement for SDR functionality, but often one forgets that it can also be an enabler for low power consumption. Indeed, once the flexibility is built into the transceiver, it can be used to adapt the performance of the radio to the actual circumstances, instead of those implied by the worst-case situation of the standard. Since linearity, filtering, noise, bandwidth, etc. can be traded for power consumption in the SDR, a smart controller is able to adapt the radio at runtime to the actual performance required, and hence can reduce the average power consumption of the SDR. In this chapter, several important innovations and concepts will be presented that bring this ultimate dream closer to reality. These include circuits for wideband LO synthesis, multifunctional receiver and transmitter blocks, novel ADC implementations, etc. The result of this all is integrated in the world’s first SDR transceiver covering the frequency range from 174 MHz to 6 GHz, implemented in a 1.2V 0.13 μ m CMOS technology.
Jan Craninckx IMEC, Leuven, Belgium, e-mail:
[email protected] L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
27
28
3 Software-Defined Radio Front-Ends
3.2 System-Level Considerations A first choice to be made is the radio architecture to be used. In the past decades, lots of studies and examples have been presented on heterodyne, homodyne, lowIF, wideband-IF, etc. architectures, all having certain benefits and problems for a certain application. Which one to choose? In view of SDR, this question becomes maybe a little bit easier to answer. Indeed, when the characteristics of all possible standards are taken into account, not a single intermediate frequency can be found that suits them all. And having multiple IFs increase the hardware cost of the SDR, which cannot be tolerated. So direct-conversion architectures are the right choice for the job. All of the well-known problems, such as DC offsets, I/Q mismatch, 1/f noise, PA pulling, etc. that have limited the proliferation of zero-IF CMOS radios into mainstream products have been solved in recent years, and it will enable the design of a low-cost front-end. A schematic vision of what the final SDR will look like is represented in Fig. 3.1. For low cost in a large-volume consumer market, the active transceiver core is implemented in a plain CMOS technology. It includes a fully reconfigurable direct conversion receiver, transmitter, and two synthesizers (for FDD operation). The functions that cannot be implemented in CMOS are included on the package substrate. These are primarily related to the interface between the active core and the antenna. They must provide high-Q bandpass filtering or even duplexing, impedance matching circuits, and power amplification. The remainder of this chapter will mainly focus on the transceiver implementation. The hard works starts with determining performance specifications of each block in the chain. The total budget for gain, noise, linearity, etc. must be divided over all
MCM substrate
CMOS IC
MEMS switches
Tunable matching
Frac-N PLL Frac-N PLL
DMQ VCO Distr. Tunable filtering
DMQ
Power amplifier
NoC controller
Fig. 3.1 Conceptual view of the SDR transceiver front-end
3.2 System-Level Considerations
29
blocks, ensuring that all possible test cases are covered, and this for every standard. Having very flexible building blocks of course helps a great deal, but making a smart system analysis at this point is crucial to obtain an optimal SDR solution. R tool called NETLISP has been developed to do this exerA custom Matlab cise, that ensures consistency during front-end design [14]. It takes in a netlist that describes all building blocks, with the performance characteristics and gain ranges, and simulates on a behavioral level the complete chain for a list of different test cases. All blocks can be modeled on varying levels of accuracy and complexity, but the integration of them in a single framework avoids inconsistencies between different levels of abstraction. Figure 3.2 shows a screenshot. The performance under all circumstances can thus be evaluated, and the building block performance can be tuned in order to fulfill all requirements. Gain ranges and signal filtering must be set such that the signal levels are an optimal trade-off between noise and distortion. Typically, four phases can be distinguished in the design flow [14]. 1. Exploration phase: Based on a generic behavioural model, plugged into endto-end system simulations, the effect of front-end non-idealities such as noise, nonlinearity, quadrature mismatch, offsets, etc. is analyzed. The resulting implementation loss curves allow the system designer to derive specifications for the different effect based on a budget for the complete link.
Fig. 3.2 System-level analysis tool
30
3 Software-Defined Radio Front-Ends
2. Cascade analysis phase: The cascade analysis distributes the specifications for the different non-idealities derived in the exploration phase over the different FE blocks. The building block models are equation-based, and allows easy evaluation of different architectures and Automatic Gain Control (AGC) or power control algorithms. 3. Design phase: This phase includes both the design of the analog circuits and the development of algorithms for compensation of the front-end non-idealities. Consistence of the simulated analog performance with the cascade analysis description is ensured by validation w.r.t. Verilog-A models that are automatically generated from the NETLISP description. 4. Verification phase: In this phase a verification of the designed FE is done. Complex and accurate models can be derived from the designed circuits, and they can be inserted in the cascade analysis to check the complete link performance. In a typical design scenario, there are some iterations between step 2 to 4. Although being a difficult excercise, the analysis can show that with the built-in flexibility also a software-defined radio can achieve stat-of-the-art performance very close to dedicated single-mode solutions. In the next sections we will go deeper into the design of some crucial building blocks.
3.3 Wideband LO Synthesis To generate all required LO signals in the range of 0.1–6 GHz, several frequency generation techniques have been proposed to relax the tuning range specifications of the voltage-controlled oscillator (VCO). They use division, mixing, multiplication or a combination of these [31]. However, to make these systems efficient in terms of phase noise and power consumption, the VCO tuning range still has to be maximized. The following section discusses the design of such a wideband VCO, whereas the architecture required to generate all LO signals will be discussed in Section 3.3.2.
3.3.1 3–5 GHz Voltage-Controlled Oscillator To reach the stringent phase noise specifications for todays mobile communication systems, most RF transceiver ICs use LC-VCOs. Frequency tuning of LC VCOs is often done by changing the capacitance value of the resonant tank using varactors. Switched or controlled inductor designs have been reported [28], but it remains difficult to cover the desired wide band continuously and to limit the deterioration of the phase noise performance caused by the insertion of these switches. Instead of using a single large varactor to tune the frequency, a mixed discrete/continuous tuning scheme is usually chosen [20]. A small varactor is used for fine continuous tuning whereas larger steps are realized by digitally switching
3.3 Wideband LO Synthesis
31
capacitors in and out of the resonant tank. This has two advantages: the VCO gain is lower, allowing easier phase-locked loop (PLL) design, and digitally switched varactors have a higher ratio between the capacitance in the on-state (CON ) and the capacitance in the off-state (COFF ). A higher CON /COFF ratio allows a larger VCO frequency tuning range. However, as the tuning range of a VCO is increased and exceeds the typical 20% range obtained in many designs, new problems and trade-offs appear that need a solution. In this design we have tackled the two main problems encountered in wideband LC-VCOs. First, the negative resistance required to maintain oscillation varies a lot over the frequency range, leading to significant overhead when a fixed active core is used. Secondly, the large variation of the VCO gain (KVCO ) across the whole tuning range creates problems for optimal and stable PLL design. Solutions are proposed for both problems.
3.3.1.1 Tank Loss Variations Required Negative Resistance In the target frequency range (<5 GHz), the losses in the oscillator tank are usually dominated by the inductor. It can be modeled by an inductor series resistance RS , which we will consider in this simple example to be frequency-independent. This simplification is of course not completely valid, since extra losses due to e.g. the skin effect will increase the resistance at higher frequencies, but that does not change the general conclusion we will make. The required negative resistance needed to compensate the inductor losses is given by [8] (3.1) Gm = RS · (ω C)2 where C is the total tank capacitance and ω is√the oscillation frequency which is of course given by the simple formula ω = 1/ L.C with L the inductance value. If we want the oscillation frequency to change for example by a factor 2, the total capacitance of the resonant tank has to be changed by a factor 4. Equation (3.1), we see that the required negative resistance needed to maintain oscillation will also change by a factor 4. The required transconductance of the active core is four times higher at the lower end of the frequency tuning range than at the higher end. In a traditional design the active core will be designed for the toughest case, i.e. for the lowest frequency. For the highest frequency the active core is largely overdimensioned, a factor 4 in our example. As this is obviously a waste of power, the oscillator small-signal transconductance should be scaled over the frequency range. Recent phase noise theory based on the impulse sensitivity function (ISF) theory of phase noise, together with a linear-time-variant circuit analysis [1], has shown however that it is not the small-signal transconductance that must be considered for optimal phase noise. Instead, phase noise only depends on the large-signal oscillation amplitude, and that one is proportional to the bias current and the parallel tank
32
3 Software-Defined Radio Front-Ends
resistance. That would allow us to reduce only the bias current to maintain the same oscillation amplitude and hence also phase noise, without touching the size of the active core transistors. Two insights show us that the truth in somewhere in between, and that we will benefit from changing the size of the transistors. First, as already pointed out in [1], the phase noise analysis changes drastically when parasitics are taken into account. The bias current source in our design was not cascoded, because of the limited headroom in the low power supply (1.2 V ) used, nor did we employ a series inductor resonating at the double frequency [17], because that technique is narrowband in nature. The total tank capacitance is also to a large extend not differential, but contains parasitics to ground in the inductor, the varactors, and certainly the transistors themselves. Because of these two parasitic effects noise from the transistors can find a path to ground, and the phase noise degrades by several decibels from the ideal case [1], which indicates that removing parasitics by changing the active transistor sizes will improve phase noise. The second argument is even more important, because it impacts the achievable tuning range. The key to a wideband VCO is of course to have a tank capacitance that consists as much as possible of varactors and as little as possible parasitics. The smaller the active transistors, the better. Here it is obvious that if we could eliminate some of the parasitics at high frequencies, the tuning range could be extended considerably.
Switched Core Architecture The basic idea behind the solution presented here is thus to not only scale the biasing current of the active core, but to simultaneously change the size of the transistors as well in order to keep parasitics at a minimum, which is beneficial for both the phase noise performance as for the achievable tuning range. Therefore the active core will be constructed from an array of core units, which can be turned on or off when necessary. In each of these core units, switches must be added to turn the active transistors on or off. The position and size of those switches has to be considered carefully, to avoid degrading the oscillator phase noise performance as well as to ensure that additional parasitic capacitances are small. The circuit diagram is depicted in Fig. 3.3. As is clear from the analysis above, it is of utmost importance that together with the negative resistance, also the parasitic capacitance is removed from the oscillator tank to ensure a large tuning range. In the ‘on’ state the switch is closed, and the parasitic capacitance is then determined by the drain-gate and source-gate capacitance of the switch, plus the drain and gate capacitance of the active transistors. This is obviously larger than the parasitics of a simple negative resistance because of the added switch parasitics, but that is not an issue. Indeed, the core units are only activated when the oscillation frequency is lowered, and hence a larger capacitance is tolerated.
3.3 Wideband LO Synthesis
33
Fig. 3.3 Wideband VCO architecture 0
1
Dtune
Vtune 0
1
Ckvco
Dunit
To estimate the capacitance in the ‘off’ state, several effects have to be considered. When the switches open, the drain voltages will be pulled to ground, and this has a different effect on the three transistors. • First, the active NMOS transistors (M1–M2) keep their gate connected to the LC tank, and they are now in the triode-region since they have a positive gate-source voltage and zero drain-source voltage. Basically, their gate capacitance remains a parasitic of the LC tank. • Second, the active PMOS transistors (M3–M4) are turned off because they also have a positive gate-source voltage. The gate capacitance therefore drops considerably, although special attention should be given to the fact that for large signals the transistor can enter the accumulation mode, where the capacitance is again high [2]. Simulations have shown here that the large-signal capacitance is hardly affected by the accumulation effect. • Third, the switch transistors (SW1–SW2) also turn off and only the drain parasitics stay attached to the gate, which results in a large drop is capacitance. To summarize, we see that the NMOS capacitance stays fixed, the PMOS reduces, and the switch disappears almost completely. This information is of course used in the sizing of the different transistors. The NMOS is made small (W = 2.6 m per unit) and the PMOS is approximately three times bigger (W = 8.5 m). The largest width is given to the transistor that has the largest ratio of CON /COFF , so the switch size is set to W = 18 m. With this structure a CON /COFF ratio close to 3 is obtained, without any significant contribution of the switches series resistance to the overall phase noise. So in fact we have been able to use the negative resistance core as
34
3 Software-Defined Radio Front-Ends
a varactor. For high oscillation frequency, the capacitance is low and there is no negative resistance. For lower frequencies, more and more core units are gradually activated, the total bias current increases to keep the oscillation amplitude steady and the parasitic capacitance increases, helping the ‘normal’ varactors in their goal to increase the total tank capacitance.
3.3.1.2 Sensitivity Variations The second problem solved in the presented design is the variation of the VCO sensitivity for large tuning-range VCOs. A change in the control voltage VTUNE results in a change Δ C in the analog varactor capacitance CVAR . This causes a change in frequency Δ f . The size of this frequency change depends on the relative importance of the analog capacitance change with respect to the total tank capacitance (that consists for a large part of digitally switched varactors). f=
−1 Δf 1 √ √ = → Δ C 4π ·C · LC 2π · LC
(3.2)
If we go back to the example of the VCO with a frequency ratio of 2, we have seen that the tank capacitance has to change by a factor 4. As can be √seen Equation (3.2), the VCO frequency sensitivity will then change by a factor 4 4 = 8. In this example the non-linearity of the CV-curve of the varactor has been neglected, but typically the varactor is used in the middle of its tuning range, where this curve is rather linear. Such a large change in VCO gain gives serious problems for the surrounding PLL design. It will prevent to keep the PLL bandwidth constant, and hence endangers the loop stability and an optimal phase noise performance. The solution proposed here is to make the varactor size changeable. Instead of making one big analog varactor, a number of unit analog varactors are used. These varactors can be controlled in two ways. Some units are used for analog continuous tuning, and their control node is connected to oscillator tuning voltage VTUNE . The other units are used for fine-grain discrete tuning, and their control node is connected either to the power supply or to ground. At the lowest frequency the sensitivity is low, so a large analog varactor is needed. Most of the unit varactors will be connected to the analog control voltage. At high frequencies the sensitivity is relatively high and only a small analog varactor is needed. The other units can then be used as discrete switched varactor, giving extra fine discrete tuning curves.
3.3.1.3 Circuit Implementation Figure 3.3 shows a simplified view of the complete VCO architecture implemented. The inductor value was chosen small (0.75 nH) and is optimized for wide tuning range. It has a symmetrical octagonal shape [7] and is implemented in the top metal
3.3 Wideband LO Synthesis
35
layer which has a thickness of 2 μ m. The next metal level is used for the underpass connections only. The typical series resistance of the inductor is about 1 Ω . The coarse frequency tuning is done with an array of 31 large varactors, controlled by the 5-bit control word DTUNE . In combination with those varactors, active core units (control word DUNIT ) add the necessary negative resistance and also add some extra capacitance when the frequency is lowered. A total of 31 switched core units are employed, in parallel with a fixed negative resistance that has the size of about 10 units. This allows controlling the negative resistance generated by the VCO core over a factor 4, as was required for a factor 2 tuning of the oscillation frequency. Correspondingly, the total current of the active core will vary between 2.1 and 8.5 mA, whereas the bias circuit consumes 0.55 mA. The analog varactor consists of 15 small units and is controlled by CKVCO . That digital code actually consists of two control words. Four bits are used to set the number of varactors that must be connected to the analog control voltage VTUNE . Fifteen other bits are used to set the varactor control to power (1) or ground (0) in case it is not used for analog control. That creates a large set of extra fine tuning curves that cover the range between two adjacent coarse tuning settings.
3.3.1.4 Calibration As there are many control bits to properly set the frequency and gain of the VCO, and as the required settings of those bits are partially dependent on process, temperature and voltage variations, a calibration sequence is needed to identify the correct setting for each desired center frequency. At power-up time, before actual operation, both the frequency and the frequency sensitivity can be estimated by comparing the number of divided output cycles to the number of reference crystal periods for a restricted set of control settings. Practically, the VCO frequency is measured in free running mode. For each coarse frequency band setting, a calibration routine can be summarized as follows: 1. Optionally: Determine the number of active core units needed to reliably start up and sustain oscillation (otherwise use default setting). 2. Determine the center frequency. 3. Determine the VCO gain associated with one small varactor by measuring the frequency difference between the on- and off-state. Based on that value, determine the number of small varactors needed to set the desired KVCO. The other small varactors will be used to create a set of fine tuning curves. 4. For each fine tuning curve (a) Determine the center frequency. (b) Store this value in a look-up table. The data stored in the look-up table can be used during normal operation to set the best VCO configuration for the desired PLL operating frequency, and the feedback action in the PLL will determine the voltage on the analog tuning node. Slow
36
3 Software-Defined Radio Front-Ends
changes in temperature or supply voltage can be corrected by monitoring the tuning voltage during burst-mode operation of the transceiver. Two comparators on the analog tuning node can detect if the voltage is below or above a threshold voltage (min 0.4V , max 0.8V in our design). The VCO can then be switched to the next tuning curve (as determined during the power-up calibration routine), and the data in the look-up data must be updated.
3.3.1.5 Implementation Results The VCO has been implemented in the SDR prototype, together with a fractional-N PLL. It is a traditional fourth order, type-2 charge-pump PLL with active loop filter and a MASH or cascaded 1-1-1 sigma-delta modulator. The external reference clock is 40 MHz. The active loop filter has a total capacitance of 190 pF, an amount that is low enough to be integrated on chip. In line with the concept of reconfigurable building blocks in a software-defined radio, the charge pump and the loop filter values can be programmed in order to modify the loop bandwidth and the PLL output phase noise. The divider is a modular architecture with a chain of divide by 2/3 cells, each time working at a lower frequency [32]. The calibration routine on the VCO was run in order to keep the VCO gain proportional to the frequency over the whole frequency range. Indeed, for best PLL design it is not necessary to keep KVCO fixed. A linear variation with frequency keeps the PLL gain (and hence also the PLL bandwidth and phase margin) constant. This can be deducted from Equation (3.3), which gives the crossover frequency of a third order, type-2 charge pump PLL [8].
ωc =
IQP · KVCO · RZ CZ · 2π · N CZ +CP
(3.3)
If we can keep KVCO proportional to the PLL division ratio N (and hence proportional to the PLL output frequency), this relaxes the requirements of the charge pump flexibility or the margin we have to take to keep the loop stable in all frequency conditions. Figure 3.4a shows a selected set of the measured frequency response of the VCO. Only some of the 32 coarse frequency steps are shown for most of the frequency range, showing a total tuning range from 3.14 to 5.2 GHz, or 49%. In the upper frequency range, a detail of the fine tuning steps is also shown. At this high frequency, only 2 of the 15 small varactors are controlled by the analog tuning voltage. The other ones can be set digitally to 0 or 1, resulting in an extra set of 14 fine tuning curves. The plot shows that there is enough overlap between consecutive curves. In the lower frequency range (not shown), the coarse tuning curves are closer together because the total capacitance in the tank is higher. But also more analog varactor units are connected to the tuning voltage, leaving fewer analog units that are digitally controlled and hence less fine tuning curves. Eventually the whole frequency band can be continuously covered with the desired slope for the oscillator sensitivity.
3.3 Wideband LO Synthesis
37
a
Fine Tuning
5.0
b
3.80
Coarse Tuning
4.0 3.5 3.0
Freq. [GHz]
Freq. [GHz]
3.78 4.5
3.76
C(KVCO)
3.74 3.72
0.2
0.4
0.6
0.8
1.0
1.2
3.70
Vtune [V]
0.2
0.4
0.6
0.8
1.0
1.2
Vtune [V]
Fig. 3.4 VCO measurement results: a Selected set of coarse and fine tuning curves; b VCO gain settings
The flexibility of the VCO gain is shown in Fig. 3.4b. For a fixed coarse frequency setting, the number of analog varactor units is changed, giving different slopes of the frequency curves. The varactors that are not connected to the analog tuning voltage are biased at the power supply, hence all curves overlap at VTUNE = 1.2V . Clearly visible in the graph is of course the limited linear range of the MOS varactors used in the design. In the PLL the VCO settings are controlled such that it is only used in the most linear range of the tuning voltage, between 0.4 and 0.8V , where KVCO is almost constant. Measured phase noise at an offset of 1 MHz ranges from −115 to −119 dBc/Hz for the upper- and lower frequency, respectively. This variation can be perfectly explained by the difference in ( f0 /Δ f )2 [8], indicating indeed that the design is still limited by the limited Q of the inductor and that the use of the switched active core allows to keep the current consumption optimal over the whole frequency range. The closed-loop integrated phase noise of the complete PLL is typically −36 dBc. These measurements show that the VCO achieves a continuous coverage over a very wide frequency range, with a fully controllable KVCO , resulting in a stable and optimal PLL design for the whole tuning voltage range used.
3.3.2 0.1–6 GHz Quadrature Generation Because of the wideband VCO, the problem of LO carrier generation can become feasible in a system that is not too complex and hence does not carry a large power penalty. The block diagram of the Divide/Multiply and Quadrature (DMQ) is presented in Fig. 3.5. The DMQ contains several divide-by-two blocks. They generate I and Q phases down to a division factor 32. Each divider consists of two dynamic simplified flip-flops in feedback. The rail-to-rail operation of the latches ensures a minimal addition of phase noise, very important in cellular standards. The DMQ further exploits a single side band (SSB) mixer. For a VCO at its 4 GHz center frequency, the mixer combines 4 or 2 GHz with 1 GHz to obtain
3 Software-Defined Radio Front-Ends 4G
BUF
DIV2 DIV2
2G
DIV2
1G
DIV2
0.5G
PPF 2
PPF 1
DIV2
DIV2
125M
4G
250M
38
4G SSB
5G:3G
4G:2G
Vdd SDR
DIV2
1.5G
Vdd DMQ
Fig. 3.5 Block diagram of the DMQ circuit
5 or 3 GHz respectively. The 2 and 1 GHz component are obtained by division of the VCO frequency. The 4 GHz quadrature phases needed for the SSB mixer operation in 5 GHz mode are generated through a polyphase filter (PPF1). The latter is a threestage polyphase filter, with notches at negative frequencies 3, 4 and 5 GHz. The circuit diagram of the SSB mixer is presented in Fig. 3.6. As both base frequencies used in the SSB mixing are square-waves, they contain all odd harmonics. These are also combined in the mixer and generate unwanted frequency components. To limit these, the 1 GHz (F1) component is first linearized by filtering out the third harmonic of the 1 GHz square wave. This negative frequency is attenuated by 40 dB with a two-stage polyphase filter with notches at −2 and −4 GHz. The output of this polyphase filter is a current whose four quadrature phases are directly injected into the SSB mixer. With the VCO at its center frequency, the mixer’s switches are driven by either a 2 or 4 GHz rail-to-rail square wave (F2). Cascode transistors below the switches provide a low impedant input for the linearized 1GHz current. The bias current is provided by current sources at the bottom. Note that both current sources and cascode transistors are common to both the I and Q path. The output of the mixer (F3) is amplified up to full rail swing with a differential pair followed by a string of inverters, of which the first one is biased around its threshold voltage. The SDR’s LO frequency can be selected by a multiplexer integrated in the DMQ. This function is obtained by powering down the unused blocks and placing their outputs in a high impedant state. In this way, no extra circuits are placed in the signal path. Except for the SSB mixer, the power consumption is mainly determined by loading and unloading (parasitic) capacitors from V dd to Gnd. This makes the power consumption largely proportional to the operating frequency. For a given frequency,
3.4 Receiver Building Blocks
39 Vdd
Vdd
F3QN
F3QP
F3IP
F3IN
F2I F2Q
VC IP
F1I
Vdd
IN QP F1Q
QN
PPF
Gnd
Fig. 3.6 SSB mixer circuit diagram
reducing the DMQ’s supply voltage will therefore limit the power consumption, by reducing the charge transfer. Depending on operating frequency and technological variations, a minimal supply voltage is required to obtain the required operating speed. To minimize the power consumption for all operating modes, a basic supply voltage regulator is implemented. It consists of a tunable current source loading a decoupling capacitor (insert in Fig. 3.5). The DMQ is fed from this capacitor and effectively unloads it. Eventually, equilibrium is reached at some internal supply voltage. Note that unused blocks are obviously turned off.
3.4 Receiver Building Blocks A key aspect for the receiver RF part is its interference robustness. The blocking requirements for simultaneous multi-mode operation imply the need for tunable narrow-band circuits at the antenna interface. Either this function can be provided by a multi-band filtering block [22], in which case the receiver’s input can be a wideband low-noise amplifier, or part of this burden can already be taken up in the LNA design, as will be shown in the following section.
40
3 Software-Defined Radio Front-Ends
3.4.1 MEMS-Enabled Dual-Band Low-Noise Amplifier In this first example the option of using MEMS switches to build a low-loss reconfigurable antenna filter section on a thin-film substrate is explored. This is especially relevant when simultaneously considering the design of the LNA, whose active CMOS part must be co-designed with the MEMS switch and the passive offchip matching. The circuit schematic of Fig. 3.7 shows how multi-band operation is achieved independently of the inductive emitter degeneration [19]. A Single-Pole Dual-Throw (SPDT) MEMS switch is used to connect the LNA to either its 1.8 GHz matching circuit and antenna filter, or to its 5 − 6 GHz section which uses just the bonding wire for input matching. To prove this concept with a commercial component, a packaged MEMS switch [27] was mounted on a PCB together with the CMOS die. Performance is only slightly affected since the loss of the switch including its package was measured to be only 0.2 dB. A mature technology that integrates MEMS switches in an MCM technology [25] will make it feasible to build more complex structures covering a broad range of frequency bands. Optimal implementation of the switchable narrow-band impedance matching at the LNA’s input has been obtained by designing the on-chip part of the LNA such that no dedicated on-board matching components are needed in the 5–6 GHz band except for a simple series dc-decoupling capacitor. The MEMS switch can approximately be regarded as a short 50 Ω transmission line, so putting the MEMS switch before the LNA chip will not change the chip’s input return loss drastically. For the
0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20
Gain Ctrl
Low-band High-band 0
1
2
3
4
5
6
7
5-6 GHz
1.8 GHz
A
B
8
MEMS SPDT
5-6GHz
L bond
50 -TL
50 -TL
Matching Network
A
Cbp
1.8GHz
CX Input Stage
Switchable Matching Network On board
B
Single to Diff. Conv.
On chip
Fig. 3.7 CMOS/MEMS co-designed dual-band LNA circuit schematic and input matching measurement (inset)
3.4 Receiver Building Blocks
41
1.8 GHz band, a simple matching network made up of one or two passive components can fulfill the matching requirement. The passband is quite narrow, making a simple passive matching network feasible. These components are less lossy at these low frequencies, and the receiver itself is also less noisy at lower frequencies, which assures a good overall NF. To maintain good matching conditions in the full implementation, the chip including pad parasitics and ESD devices has been designed in combination with the bondwire inductance, on-board components and board parasitics. The RF bondpad is modeled by a 65 f F parasitic capacitance, in series with a 50 Ω resistance. The ESD diodes are sized 2 × 0.6 × 12 μ m each and have in total a 60 f F parasitic capacitance. Each bondwire is modeled by a 1.3 nH inductance. PCB traces are modeled as small transmission lines when needed. Surface-mounted components on the board are characterized carefully with dedicated separate de-embedding structures. An extra on-chip capacitor CX connecting the gate and source of the input transistor reduces the gate inductance needed for the input matching to a value of 1.3 nH at 5 GHz. Otherwise, this inductance can be unrealistically large and the Q-factor of the input resonance network of the LNA would be too high to cover a 1 GHz band. Thanks to the 300 f F capacitance, the gate inductance can be implemented with a single bonding wire. Internally, the LNA has two separate outputs to cover the required frequency range. A resistively loaded output is of course small in area and flexible in terms of wide bandwidth, but can in the 0.13 μ m CMOS technology used only provide enough gain at frequencies up to 2.5 GHz. Therefore a second output is added for the 5–6 GHz band with an LC-tuned load, and the selection of either one of those is done by the proper biasing of the cascode transistors. A resistor in parallel with this inductor lowers its quality factor and hence increases its bandwidth in order to cover the 1 GHz bandwidth. Gain switching is achieved by the current steering technique, when the third common-gate transistor is activated which bypasses a certain fraction of the signal current to the power supply so as to reduce the gain. Finally, both outputs pass through a multiplexing single-to-differential converter. The input stage is biased at 5.8 mA, and another 3.6 mA is used in the second stage. As already indicated in Fig. 3.7, S11 input matching better than −10 dB is achieved in both bands. The simulated LNA NF is around 2 dB, while the IIP3 value is −5 dBm in the low band, and 3 dBm in the high band.
3.4.2 Wideband Low-Noise Amplifiers Another option to demonstrate the SDR concept is to rely completely on the passives in the antenna interface for RF interference and blocking filtering. This makes the realization of the concept easier, as commercially available (multi-band) filtering blocks can be used in the implementation. Wideband low-noise amplifiers must now
42
3 Software-Defined Radio Front-Ends
be used, that cover an RF frequency range as large as possible for optimal flexibility, but on the other hand must still achieve state-of-the-art performance w.r.t. narrowband LNAs. Covering the full 100 MHz to 6 GHz frequency range is challenging since achieving a low NF at hundreds of MHz requires large transistors with low 1/f noise, while moving towards carrier frequencies of a few GHz requires fast transistors. Recently, several 90 nm wide band inductor-less feedback LNAs have been reported [33]. However, none of them achieve the performance targeted by a link budget analysis for an SDR-LNA below 500 MHz (for a 1.2V power supply). LC matched Common Source (CS) LNAs typically cover a bandwidth from 3 to 10 GHz [3]. Extending the bandwidth down to 100 MHz would require prohibitively large inductors and thus chip area. In the presented SDR front end, two LNA’s are combined to cover the whole frequency range: an inductor-less feedback LNA with a small form factor (Fig. 3.9) covers frequencies from 100 MHz up to 2.5 GHz, and a CS LC matched LNA (Fig. 3.10) covers frequencies from 2.5 GHz up to 6 GHz. Only one LNA is powered at a time to save power and provide filtering over half of the bandwidth. The low-band LNA relies on resistive feedback to achieve input matching. The resistive feedback design in general has lower gain and higher noise figure because the resistor in the shunt feedback path directly adds noise current to the input, increasing the overall noise figure of the amplifier. Proper selection of a shunt feedback resistor, however, can minimize the noise contribution from the feedback resistor without affecting the overall gain and matching at the input and output. One of the key advantages of resistive feedback LNAs is of course the tremendous area savings it offers since there are no on-chip spiral inductors. The operating principle of the resistive feedback LNA is shown in Fig. 3.8. It uses the noise-cancelling technique presented in [4] to achieve a low NF. Transistor Mn1 is the amplifying transistor, and resistor R f implements the feedback. Input matching is realized by sizing both Mn1 and R f . To this a noise cancelling stage is added formed by transistors Mn2 and Mn3 . This last stage can cancel the noise of the input transistor when it is properly sized. For complete noise cancelling, the gain of the last stage must be equal to the gain of the overall LNA, i.e.
Matching stage
Matching stage
Vy Vx
Vx +
Rf Rs
Mn1
−Av Noise cancelling stage
Fig. 3.8 LNA noise cancelling principle
Mn3 Noise cancelling stage
Rf Rs
Vy
Mn1
Mn2
3.4 Receiver Building Blocks
43
VC
Bandgap
OUT IN
Fig. 3.9 Schematic of the wideband resistive feedback LNA
AV =
Rf gm,Mn2 = 1+ gm,Mn3 RS
(3.4)
Since R f is typically a factor 5–10 greater than RS (50 Ω ), the transconductance of Mn2 has to be 5–10 times larger than Mn3 , leading to an operating point with a large gm /ID ratio, i.e. moderate to weak inversion. However, since that choice severely impacts the linearity of the LNA, a compromise between linearity and noise figure is made when sizing the noise cancelling stage. The final circuit topology of the low-band LNA is shown in Fig. 3.9. It provides good linearity and is efficient in power consumption due to the use of both NMOS and PMOS devices in the matching/amplifying stage. Taking into account the required gain and linearity, the maximum bandwidth achievable at the moment in the 0.13 μ m CMOS technology is limited. More advanced technology nodes should overcome this issue in the future. It employs resistive feedback for wideband matching and noise canceling for low NF over a wide band. The feedback resistance does not only set the matching but the gain as well, limiting the freedom for input matching. The bandwidth of the LNA is limited by the parasitic capacitance at the input, the output and the internal node. The following aspects were taken into account during the design: – A capacitor and the source of the PMOS input transistor ensures AC ground. This capacitor also has to filter the noise of the bias transistor of the input stage down to frequencies below 100 MHz. – A high-pass filter is inserted in between the two stage, which allows to properly bias both of them. – The LNA has an ESD-protected input pad which employs two diodes. These are taken into account in the simulations. – The LNA is connected to the outside world by bonding wires. Bonding wire inductances of the RF input and power pads are modeled by a 1.2 nH inductor and included in simulations.
44
3 Software-Defined Radio Front-Ends
– The cascode transistor used in the circuit shown in Fig. 3.8 is not used in the final implementation. The required high gain (22 dB) results in a large swing at the LNA output. The cascode limits the available headroom and is therefore left out. – Sufficient decoupling capacitance is foreseen on the power supply. – A digitally controlled bank of resistors allows to switch from high to intermediate and low gain modes. The choice of R f impacts the gain, the noise figure and the linearity of the LNA. The higher the value of R f , the larger the gain, the smaller the noise figure and the worse the linearity. – The biasing is done with a programmable current source. This allows to vary the gain in small steps around the different gain modes and to switch from low to high gain mode by increasing the power only by half. At the maximum gain of 22 dB (including single-ended to differential conversion stage), typical simulation results achieve a noise figure of 2 dB, and a linearity of −10 dBm at a power consumption of 12 mW . At reduced gain (10 dB), the linearity improves to +3 dBm while at the same time the power consumption lowers to 8 mW . The broad input matching bandwidth of the high-band LNA is achieved by taking up the input impedance of an inductively degenerated common-source stage into an LC bandpass filter [3]. Input matching from 6 GHz down to 2.4 GHz can be done with inductive elements of reasonable values, but extending that frequency band to lower values is practically not feasible. Also the bonding wire and the ESD diodes are taken into account in the matching network, as shown in Fig. 3.10. The noise performance of the proposed topology is determined by two main contributions: the LC input network and the noise of the amplifying transistor. The noise contribution of the input network is due to the limited quality factor of the integrated inductors. The optimization of the transistor noise contribution relies instead on the choice of its width for a given bias current. In contrast to the spot (single) frequency noise matching in the narrow band case, in the wide band case the noise matching is optimized over the entire frequency range.
OUT VC
Bandgap
IN
Fig. 3.10 Schematic of the wideband LC-matched LNA
3.4 Receiver Building Blocks
45
Also here, the following aspects were taken into account during the design: – The ESD devices, parasitic capacitance of the input pad and bonding wire at the input of the LNA are taken into account in the input matching. – An extra capacitance placed in parallel with the gate source capacitance of the transistor adds a degree of freedom for the noise and impedance match. Its value is even programmable to extend the input bandwidth. – A 4 bit programmable capacitor bank is placed at the output of the LNA to allow frequency band switching. Pull-up resistors are also added to lower the parasitics of the switch in the ‘off’ state, and hence improve the linearity performance. – Gain switching is achieved with a bypass cascode transistor, that diverts a part of the signal current to the power supply for lower gain. This way, the input matching is barely influenced by the gain switching, at the obvious drawback that no current is saved in the low gain mode compared to the high gain mode. – Sufficient power supply decoupling is foreseen, as well as a programmable biasing that can vary the LNA operating point in small steps to overcome process and temperature variations. Simulated values for NF and IIP3 are 2.4 dB and −10 dBm, respectively, at a maximum gain of 22 dB, with a power consumption of 12 mW .
3.4.3 Wideband Downconversion Mixer As far as the dependence on RF carrier frequency is concerned, the downconversion mixer poses no specific problems, as long as the correct LO frequency is applied of course. Because of the low power supply voltage (1.2V ), a folded topology must be used, which does have some drawbacks indeed. The extra folding transistors will contribute a certain amount of thermal noise, deteriorating the overall receiver’s NF. Some RF signal current will be lost in the parasitics at the folding node, lowering the mixer’s conversion gain. The mixer, as shown in Fig. 3.11, consists of a folded double-balanced Gilbert cell driving two current mirrors. The Gilbert cell is intrinsically wideband as it has a capacitive input impedance and can be driven by a voltage source. Here it is used for wideband operation up to 6 GHz and is the core of the mixer. The current mirrors have a digitally programmable gain B. An NMOS input pair is used as a transconductance, driving RF signal current into the folded PMOS transistors that form the Gilbert cell. The input must be designed carefully, as it will determine both the noise and the linearity performance of the mixer. A rather large overdrive is needed for linearity, which in combination with the required low noise performance results in considerable biasing current of 5 mA. The LO switching transistors in the Gilbert cell are folded PMOS transistors. The folding structure is essential to obtain sufficient headroom for all transistors. Using PMOS transistors as LO switches is better for reducing the mixer’s flicker noise than using NMOS transistors. Nonetheless, for even higher frequencies, NMOS
46
3 Software-Defined Radio Front-Ends
LO− RF+
RF−
RF
LO+
B:1 i o+
LO+
1:B io−
LO
Fig. 3.11 Schematic of the wideband downconverter
transistors may behave better because of their higher switching speeds and smaller parasitics. The noise contributions in a switching mixer are not easy to understand or analyze [11], but can generally be kept within limits by using large LO signals and reduced DC current through the switching transistors. The switchable gain of the mixer is achieved by two flexible current mirrors whose output transistors and current gain B is digitally programmable. Consequently, the mixer is actually a voltage-input and current-output building block. The current-mode output is indeed used to drive the subsequent low-pass channel filter. Too much voltage gain must still be avoided at this point to prevent clipping of the output due to strong interferers, and is only allowed after the first stage of the baseband low-pass filter.
3.4.4 Flexible Baseband Analog Circuits To accommodate the bandwidths from Bluetooth to WLAN 802.11n, the flexible baseband filter should cover a relatively wide range of frequencies, from a few hundreds of KHz to several MHz. Selectivity scalability could allow to change the filter order in case weaker interferers are detected or in case the selected standard requires less adjacent channel selectivity. The requirements in terms of noise and linearity for the analog baseband channel processing depend of course on the decisions taken in the link budget analysis for the whole receiver. In addition to that, an adaptive integrated noise level may lead to a further power saving in case a smaller signal to noise ratio (SNR) is required [13].
3.4 Receiver Building Blocks
47
Adaptive control to provide flexible frequency discrimination and gain control may currently be met with better overall performance by employing proper tunable analog circuits. A possible solution implies the use of modular circuits made of the proper combination of basic units. This allows indeed to efficiently add the desired digital controls, while minimizing layout issues. The concept of component arrays fits perfectly with these needs. An array is defined here as the parallel connection of dynamic analog blocks dimensioned in a binary scaled fashion and activated whenever needed. A switchable op-amp is shown in Fig. 3.12. It is the basic unit of the flexible opamp, which is made up of parallel connections of switchable opamps in a binary scaled array. Based on the standard Miller compensated architecture, this opamp is switched on/off through a single bit. In particular, when the opamp is off, all the PMOS gates are at VDD and all the NMOS ones are grounded by means of MOS switches. Therefore, in the off mode, the op-amp shows very high output impedance and zero power consumption. The switches that carry signal must be carefully sized, trading off their finite conductance for their nonlinear characteristic. The two Miller capacitors arrays CC,array are connected at the nodes CC(1,2)(a,b) while the node Vcm is connected to the output OutP and OutM by means of two resistors Rcm f b . In these conditions, the poles and the zero still maintain their original position as in a standard Miller opamp. This flexible opamp is the basic active component of the biquadratic sections of the channel-select filter. It provides reconfigurable Gain-Bandwidth product (GBW), dynamic impedance scaling and power scalability. The sensitivity of the dominant poles to supply voltage, process and temperature variations can be minimized by employing a proper bias circuit [10]. Also all resistor and capacitors in the filter are implemented as arrays, such that their value can be controlled digitally over a very wide range. The capacitors arrays are the binary weighted connection of basic units of metal interconnect capacitors. For better linearity performance, the NMOS control switches are always on the virtual ground side where the voltage swing is close to zero. The resistors arrays are built as a binary weighted connection of polysilicon resistors. The control switches are implemented as straight NMOS-PMOS transmission gates: this solution assures a lower ON resistance and a better linear behavior. Figure 3.13 shows the schematic of the flexible baseband LPF based on an optimized cascade of Active-Gm -RC [10] and Rauch biquadratic sections. The combination of this two biquadratic cells is based on power/linearity considerations. Active-Gm -RC cells guarantee a very good dynamic range with a limited cost in power. However, linearity in this cell is limited by the “weak” virtual ground of its opamp. Therefore the Rauch cell allows to reach the required linearity for the overall filter. Both biquad topologies include the analog components arrays. This solution provides this baseband block with all the required SDR programmability. The LPF provides the following features: coarse frequency tuning with adaptive power consumption by digitally controlling the resistor values and the number of switchable opamp units. Fine frequency tuning for RC process deviation compensation is achieved by controlling the capacitor value. One of the power/performance trade-offs we implemented is to reduce the stop-band attenuation performance in
OutP
PMOS
M6
Rc Cc1A
InM
M7
Cc1B
NMOS
NMOS
PMOS NMOS
M3
M1
Fig. 3.12 Switchable Miller opamp schematic
NMOS
M4
M2
M5
NMOS
Rc
InP
M8
Cc2B
NMOS
Cc2A
PMOS
bit
Vbias
VDD
M9
OutM
Vcm NMOS
M23
M21
M11
M24
M22
R
R
48 3 Software-Defined Radio Front-Ends
NMOS
NMOS
NMOS
PMOS
NMOS
Bypass
TGATE
R1_array
C1_array
R1_array
-
+
R1_array
Cc1_array
FLOA1
Cc1_array
Active-Gm-RC biquad
+
Fig. 3.13 The flexible low pass filter schematic
InP
InM
R1_array
Bypass
TGATE
Bypass
TGATE
R21_array
C21_array
R21_array
Bypass
TGATE
R22_array
R22_array
R21_array
C22_array
-
+
C22_array
R21_array
Rauch biquad
Cc2_array
FLOA2
Cc2_array
+
Bypass
TGATE
R3_array
C3_array
R3_array
Bypass
TGATE
R3_array
-
+
R3_array
Cc3_array
FLOA3
Cc3_array
Active-Gm-RC biquad
+ OutP
OutM
3.4 Receiver Building Blocks 49
50
3 Software-Defined Radio Front-Ends
a 10
b 10 0
0
−10 Amplitude [dB]
Amplitude [dB]
−10 −20 −30 −40 −50
−20 −30 −40 −50 −60 −70
−60
−80
−70
10
6
7
10 Frequency [Hz]
10
8
c 20 16
d 40
IRN=85.37uVrms IRN=125.3uVrms IRN=163uVrms
20 Output Power [dBVp]
Current Consumption [mA]
18
−90
14 12
106
107 Frequency [Hz]
108
Fundamental 3rd order
0
−20
10
−40
8 6
−60
4 −80
2 0
0.5
1
1.5
Cut−Off Frequency [Hz]
2
2.5 x 107
−100 −30
−20
−10
0
10
20
Input Power [dBVp]
Fig. 3.14 Flexible lowpass filter measurements: a Cutoff frequency; b filter order; c noise/power trade-offs; d third order intercept point
case where no large interferers or blockers signals are detected. This is accomplished by turning-off and bypassing biquadratic sections. Low on-resistance bypass switches are implemented in each biquad. Furthermore, power consumption can be traded for increased kT /C noise by decreasing the capacitor sizes and the transconductance and increasing the resistor values at the same time. For proper interfacing with the downconversion mixer, the biquads can also be placed in a current-input mode. Figure 3.14 summarizes the measurements of the standalone flexible lowpass filter. For a sixth order Butterworth selectivity and 85 μ Vrms of input integrated inband noise level, the cut-off frequency can be moved from 0.55 to 17.6 MHz with a coarse step of 0.55 MHz, as shown in Fig. 3.14a. The power consumption decreases linearly with reduced cut-off frequency: for example, the filter consumes 13.2 mW for WLAN 802.11a (11 MHz) and 3.6 mW for UMTS 3.86 (2.11 MHz) showing lower power consumption than comparable but less flexible designs [10]. In addition to this coarse frequency tuning, the flexible LPF provides fine frequency tuning by configuring its 7-bit capacitor arrays. By taking into account also this possibility, the effective frequency tuning range is included between 0.35 and 23.5 MHz. As shown in Fig. 3.14b. The transition band of the filter can be traded off for less power consumption in case no large interferers are detected or when less selectivity
3.4 Receiver Building Blocks
51
is required. Second, fourth, and sixth order Butterworth-like selectivity are available by bypassing one or two biquadratic sections. Figure 3.14c shows how power consumption can be traded off for noise in case a different standards or relaxed sensitivity requires lower SNR. By reducing the total capacitor size by a factor of e.g. 2 or 4 (and simultaneously increasing the resistor values to keep the filter bandwidth fixed), the power consumption decreases at the cost of a square-root increase in total integrated noise level. Figure 3.14d shows the IIP3 measurement at the maximum cut-off frequency. The input tones are at 8 and 9 MHz so that the intermodulation products are well in-band. An IIP3 of 10 dBV p confirms the expected result. This linearity performance is nearly constant for all the cut-off frequency settings. This feature makes the proposed design a very good alternative to the less-linear GmC filters even for flexible designs. I and Q mismatch measurements were also performed for every cut-off frequency, both amplitude and phase mismatch are well below 0.25 dB and 2.8◦ respectively. The Variable-Gain Amplifier (VGA) that increases the filtered signal level to an amplitude fit for the dynamic range of the ADC is designed using the same philosophy. It is built from two cascaded inverting amplifiers using resistive feedback. The use of a flexible opamp makes it possible to save power by adapting the opamp bandwidth to the expected signal bandwidth. The gain of the VGA can be adapted in 3 dB steps from 0 to 39 dB. The gain switching time is constrained by the fast Automatic Gain Control (AGC) operation and should be lower than 100 ns. In addition to that, the VGA provides different noise levels by changing the resistors arrays by fixed factors. A DC-offset compensation loop is added that senses and removes the DCoffset at power-up (in closed-loop operation) and holds the steady-state DC offset compensation value during the received burst (open-loop operation). This internal loop can also be de-activated, in which case a mixed-signal DC-offset compensation loop is used that measures the DC level in the digital domain and uses a feedback DAC to compensate it at the input of the AGC.
3.4.5 Analog-to-Digital Conversion The requirements posed on the ADC by the various standards differ of course widely, depending on the signal bandwidth, the SNR needed and even more on the amount of channel filtering and amplification of the preceding receive chain. In the link budget used here, typically 8 to 10 bits are needed with sampling speeds up to 40 MS/s (two of them should be interleaved to achieve 80 MS/s operation for 802.11n systems). Those specs have for a long time been the territory of pipeline architectures. But in scaled CMOS they become well in range of SAR ADCs. Most SAR ADCs use an operating principle similar to the charge redistribution architecture [21]. This requires fast settling opamps in both the input and the reference voltages, able to settle their output voltage in a very short time while driving large capacitive loads. Also the high-speed clock for the controller that has to run at 10× the sampling speed must be available.
52
a
3 Software-Defined Radio Front-Ends INp
ST
b
SS VTP CTP
Track
VQP N-1
4
2
Sample
1
Precharge
M=2
CSP
Comp cp[0] cn[0] cp[1]
CU CTN
CSN
INp cn
INn
VTN
Track
CLK
VTp
cp
VQN
Sample
Reset
cp[0..N-2] cn[0..N-2]
Precharge
VQp Comp
Control block
Result
VQn
@FS
B[0..N-1]
Fig. 3.15 Charge-sharing SAR ADC: a basic architecture; b sample waveforms
A new SAR architecture is proposed that uses passive charge sharing (instead of active charge redistribution) to both sample the input signal and to perform the binary scaled feedback during the successive approximation [6]. The basic architecture depicted in Fig. 3.15a works completely in the charge-domain. The input is sampled on a capacitor and during the SAR algorithm charge is added or subtracted until the result converges to zero. No active circuits are used to add/subtract these charges. Instead, simple passive switches do this. The only active element is the comparator itself, which is the basic principle that allows achieving the fundamental lowest limit on power consumption. The operation of the ADC can be explained with the waveforms shown in Fig. 3.15b. Before the start of conversion the sampling capacitors CS are reset to zero. The tracking switch ST is closed and the sampling switch SS is open, so the charge on the tracking capacitors CT follows the differential input signal. The binary scaled array of unit capacitors CU is pre-charged to the power supply. The charge on this capacitor array will be used to provide the feedback DAC function in the SAR ADC, and thus functions as the reference voltage in a charge-redistribution SAR. Since this pre-charging happens before the actual A to D conversion, is does not depend on the input signal and thus no tough constraints are imposed on this reference. The conversion process starts with the sampling of the input signal. Therefore ST opens to hold the input and when SS closes half of the charge on CT is transferred to CS in a simple passive charge-sharing action. The transfer time needed is very short since it is proportional to the on-resistance of the switch. After sampling, SS opens and ST closes, so the input signal can be tracked again. The sampling process typically takes less than 2 ns, so abundant time is available for tracking and no settling problems occur. To determine the MSB, the comparator is activated and after its decision a charge equal to half of the input range is added to or subtracted from the sampled charge. Therefore one of the signals Cp[0] or Cn[0] goes high, the other one stays low. This
3.4 Receiver Building Blocks
53
is again a passive process that does not take any power and yet settles very fast in an advanced technology. The total charge stored now is equal to CS · VIN /2 ± CMSB ·VDD and is distributed proportionally across the two capacitors. The next bits are determined in a similar operation: the comparator determines for each iteration the sign of the charge, and the binary scaled pre-charged capacitors are connected one by one to make the total charge converge to zero. The fundamental power limits of the original architecture have been removed by doing all the charge-redistribution passively. The input is connected to the tracking capacitor for most of the conversion period, so no fast opamp is needed for input sampling. Secondly, settling problems in the reference voltage are avoided by precharging all capacitors to the same voltage before the conversion process, and this is signal-independent. This way the only remaining active element in the ADC is the comparator itself and the digital controller. Another advantage is the completely digital implementation, requiring only MOS switches and MOM capacitors, which makes it portable to new CMOS technologies. Two constraints determine the size of the capacitors. For device mismatch, the total size of the capacitor array must be at least 2 pF. Much more stringent however is the size of the unit capacitor, this must be 2N−1 times smaller and thus turns out to be only 8 f F. As this is not practical, A large unit capacitor is used, which is pre-charged to only a fraction of the full reference for the 3 LSBs. The comparator is based on [23] and includes a programmable capacitor array to calibrate its offset voltage. It does not consume any power when inactive, and thus enables for the whole ADC the feature that its power consumption scales linearly with the sampling frequency. Typical comparison time is about 0.3 ns. To avoid the need of a high-speed clock (and its associated power consumption), an asynchronous controller was implemented. On the rising edge of the input clock, a pulse in the tracking/sampling signal is created as already shown in Fig. 3.15b. The width of this pulse (2 ns) is proportional to the RC time constant of a MOS switch and a MOM capacitor to track the time required for the charge-sharing sampling process over process variations. Then the conversion process starts that repetitively activates the comparator, waits for a valid output, closes one of the sharing switches, and waits for another delay (1 ns) while the charge-sharing settles. When this has happened nine times, the output bits are toggled. The sampling capacitors are then reset and the capacitor array is pre-charged, waiting for the next input clock edge. The total conversion time is about 20 ns, so the maximum conversion rate of the ADC is 50 MS/s. The charge-sharing SAR ADC is implemented separately in a 90 nm 1P9M digital CMOS process. At 50 MS/s conversion rate, the total power consumption is 0.7 mA from a 1V power supply, divided over the different blocks as follows: digital 50%, comparator 35%, pre-charging 15%. Measured INL and DNL at 50 MS/s are below 0.6 LSB as shown in Fig. 3.16a. Despite this good linearity, the SNDR for low-frequency input signals is only 49 dB (ENOB = 7.8) because it is limited by underestimated comparator noise. At frequencies above 10 MHz, the nonlinearity of the input tracking switch becomes dominant because the low-VT transistors intended to be used were not processed correctly. This deteriorates the ENOB for
54
0.6 0.4 0.2 0 −0.2 −0.4
c
[dBFS]
b
INL [LSB]
0.6 0.4 0.2 0 −0.2 −0.4 −0.6
DNL [LSB]
a
3 Software-Defined Radio Front-Ends 0 −20 −40 −60 −80
ENOB
0
−256 −192 −128 −64
0
64
128
192
256
code [−]
Fs = 20MS/s P = 290µW ENOB = 7.4
2M
SNR = 48.7dB THD = −50.5dB
4M
6M
8M
10M
9 8 7 Fs = 50MS/s 6 P = 725uW 1k
10k
100k
1M
10M
Frequency [Hz]
Fig. 3.16 SAR ADC measurements: a INL/DNL; b near-Nyquist FFT at 20 MS/s; c ENOB vs. input frequency at 50 MS/s
a near-Nyquist input at 20 MS/s to 7.4 as shown in Fig. 3.16b. With only 290 μ W power, the resulting FoM is only 65 f J per conversion step. As none of the ADC building blocks consume any static power, the FOM is maintained down to very low conversion rate, allowing this ADC to be used in a wide variety of applications.
3.5 Transmitter Building Blocks In the presented architecture, a direct-conversion transmit architecture is used, as this offers the most potential for flexibility. The baseband section uses a programmable filter similar to the one discussed in Section 3.4.4 to remove the DAC aliases, and a wideband direct upconversion mixer. The Pre-Power Amplifier (PPA) is the final block in the SDR transmit path and is discussed here in more detail. The PPA includes extensive programmability of gain settings. The full circuit consists of four stages, and is shown in Fig. 3.17. The output stage is an inductively loaded common source amplifier with programmable bias current for optimal linearity versus power trade-off. Some care must be taken in the reliability and lifetime to the output transistor, as it is possible that the output signal swings above the power supply and hence violates the reliability ratings of the technology. The use of a thick-oxide transistor, available for 3.3V I/O compatibility, would totally solve this problem, but the large transistor and its associated parasitics would prevent the circuit from operating at high frequencies. However, in this PPA circuit these problems are much less severe than in e.g. a full-power CMOS PA. The effective output power is not so high, and for many applications the average swing is much lower than the peak. Furthermore, a non-negligible voltage drop across the series resistance of the inductor sets the DC output voltage below the power supply. During the optimization of the circuit, a compromise is made for dimensions, power consumption and linearity. The boundary conditions are the required output
3.5 Transmitter Building Blocks
55
INP INN
OUT
Fig. 3.17 Pre-Power amplifier block diagram
power and linearity, which lead to a maximum bias current of 45 mA. As the output stage is not always used at maximal output power and/or maximal linearity, a programmability of the bias current has been introduced to change the transistor operating point depending on the required performance, and hence trade off linearity versus power consumption. The input signal is injected capacitively onto the output transistor by the preceding stage in the PPA. This stage provides gain programmability. Its circuit diagram is presented in Fig. 3.18a. The PPA is a chain of amplifiers capable of amplifying the signal from the TX Mixer to the required output power. This power amplification consists of voltage amplification and impedance reduction of the signal. The complete PPA was designed from the output to the input. The design considerations for this Stage-1 are therefore to be able to drive the output stage, while reducing the load to the previous stage considerably. Furthermore, sufficient gain programmability has to be integrated, and an adjustable bias must provided to optimize power consumption in any given situation. The core of the amplifier is a common source stage with a PMOS resistive load. To control the gain, three extra PMOS transistors are placed in parallel with the main load. Their gates can be connected to Vdd, to turn them off and increase the gain, or to ground, to put them in the linear region and decrease the gain. However, changing the resistive load also has an impact on the DC voltage at the output of the amplifier and so on its linearity. For a high gain, the resistive load is large. The DC output voltage drops, compressing the drive transistor. This effect is reduced by keeping part of the bias current out of the PMOS load. In the presented stage, this is implemented by reusing the resistive load PMOS transistors that are turned off in high gain. Instead, the PMOS load transistors that do not contribute to the resistive load are used as current sources. Their gates are biased at a DC level, which is controlled by a DC-level feedback circuit (DCFB). In this way, the high gain linearity is enhanced without adding extra parasitic capacitances on the signal path. A slight disadvantage of using the PMOS transistors as current source, rather than turning them off, is that the gain is slightly reduced due to the limited output impedance of the current sources. Therefore, the possibility is left open to select whether the PMOS is turned off or used as a current source. Another point of attention is the reference level to which the DC output level is controlled. In high bias current conditions in
56
3 Software-Defined Radio Front-Ends
a
DCFB
IN
Ref OUT
b
Vdd
IN
OUT
VC
Gnd
Fig. 3.18 Pre-power amplifier internal stages: a variable-gain stage-1; b stage-2 with cascode deviator
combination with high gain, it is not possible to have a high output DC voltage. The DC-level feedback will clip to ground and the bias PMOS transistors will act as resistors reducing the gain rather than as current sources. To limit this effect, it is possible to program the target output DC level voltage to some extend. Finally, the total bias current through the amplifier can be controlled to optimize the power consumption for the required linearity. The stage preceding the previously described stage has a similar structure but with extended gain programmability. A cascode transistor has been added to the main amplifiers path (Fig. 3.18b). This is possible as the signal swing is still relatively small in this stage. In parallel to the main branch, several binary weighted cascode transistors are used to deviate part of the signal current to the supply resulting in smaller gains. Although this approach may appear suboptimal from a power consumption point of view, the relative current that is lost for the complete PPA is minor, as the amplifier stages are scaled down from the output to the input.
3.6 Calibration Techniques
57
The performance of this circuit varies of course widely over carrier frequency, required output power, bias and gain settings, etc. Simulation results indicate a total gain range of 50 dB, and typical IM3 distortion levels of −35 at 0 dBm output power.
3.6 Calibration Techniques In a multi-mode zero-IF transceiver, design specifications normally span over a broad range of present and future standards. Evidently, this overloads the design requirements and hampers effective design. An alternative is to design according to realistic design requirements and to digitally calibrate the front end for its eventual imperfections. The goal of calibration is to improve or optimize the performance of a full IC transceiver. In this sense, calibration relaxes the design requirements for multi-mode systems and enables the use of a low-cost architecture while being compliant with a broad range of standards. In the context of multi-mode systems, calibration should be dynamic, efficient, automatic and implemented in the system. Calibration consists typically of two steps: • Characterization: The system imperfections are estimated, often at discrete timeinstances, namely at system start-up, at mode-handover and/or at system defined time instances. Most of todays calibration techniques require to deactivate the systems normal operation, which is sometimes not allowed. Fortunately, some alternative techniques exist allowing the system to remain operational during calibration. • Compensation: Once the imperfections of the receiver are characterized, the obtained information is used to optimize its performance. Typically this is achieved by digital pre- and post-compensation of the base-band time domain signal. Calibration is one of the keys to enable the realization of low-cost, high-performance SDR mobile terminals. However, also the cost required must be limited, and fully analog high frequency calibration techniques are therefore rejected. The only relevant techniques in the SDR context are the digital calibration techniques which will be briefly discussed.
3.6.1 Quadrature Imbalance In homodyne receiver, the RF signal is directly down-converted to a complex-valued baseband signal, which ideally must have equal amplitude and a phase difference of 90◦ . In a practical implementation, slight mismatches in amplitude and phase result in a quadrature imbalance for both transmitter and receiver chain. This imbalance is generally characterized by its amplitude mismatch ε and phase mismatch Δ φ , resulting in a negative frequency rejection of NFR = 10log10(ε 2 + tan2 (Δ φ )).
58
3 Software-Defined Radio Front-Ends
Different quadrature imbalance characterization techniques exist at the receiver side, mainly based on adaptive filtering [26, 30]. As these techniques exhibit slow convergence, they are applicable in steaming modes only. An alternative technique is presented in [29], where quadrature imbalance characterization is preformed based on one calibration measurement. This fast convergence builds on the realistic assumption of a smooth channel between the transmitter and receiver system. Main drawback of all mentioned receiver characterization techniques is however the need for a quadrature imbalance-free generated transmitter signal and thus an ideal transmitter. A promising technique is presented in [12], where the quadrature imbalance of the transmitter and receiver system is characterized separately based on a single calibration measurement. This technique might be perfectly suited in the multi-mode context. Realistic values of transmitter and receiver quadrature imbalance range up to 5%, 6◦ and −20 dBc for ε , Δ φ and NFR respectively. After calibration, the remaining transmitter and receiver quadrature imbalance should be lower than −35 dBc.
3.6.2 DC-Offset In direct conversion receivers, parasitic coupling between the LO path and the RF path in both directions will cause self-mixing and creation of a DC-offset at the base-band signal [24]. This DC-offset decreases the effective dynamic signal swing, especially when the received signal power is low, and therefore reduces the gain and linearity performances of the receiver. Using a high-pass filter to remove the DC-offset from the base-band signal is not always an option: the targeted standards have a dense spectral occupation and such filters cannot be implemented efficiently. Therefore, another calibration technique should be used. As the DC-offset in the receiver path is mainly generated in the mixer, to limit its impact it should also be removed as close as possible to this mixer in the base-band path. [15] suggests an architecture adding a digitally controlled complex compensation DC-offset directly after the mixer. Building on similar architectural approach, [5, 12] presents characterization algorithms that find the optimal complex compensation DC-value while keeping the system operational. Both techniques provide very fast convergence and are thus applicable in the multi-mode transceivers.
3.6.3 Impact of LPF Spectral Behavior Accurate control of the bandwidth of the channel-select filters is required in order to guarantee e.g. sufficient suppression of adjacent channels. The spectral behavior calibration of the LPF can be performed completely at base-band; base-band switches are used to connect (part of) the analog base-band circuitry in-between the transceivers base-band input and output. In a given configuration, the spectral behav-
3.7 Full SDR Implementation
59
ior or transfer function can be characterized by comparing the digital signal before and after propagation through the analog circuit. When using a multi-tone signal with a relative dense and sufficiently wide spectral content, the cutoff frequency, the in-band ripple and the spectral phase relation can be easily characterized.
3.7 Full SDR Implementation The complete SDR front-end (without ADC) has been implemented in a 1.2V 0.13 μ m CMOS technology [18]. This section describes the prototype with dual wideband low-noise amplifiers, the one with a MEMS-enabled dual-band LNA is reported in [6]. Figure 3.19 shows a microphotograph of the SDR front-end with highlights on the major circuits. The total die area is 3 × 3.8 mm2 , of which about 7.7 mm2 is taken up by active circuits. To verify the receiver behavior over the complete input power range, a receive budget measurement is shown in Fig. 3.20 for three different channel bandwidths. The SNDR is limited by thermal noise for low input powers. The 1/ f noise corner is around 200 kHz, which explains the higher NF for the low-BW mode. Realistic interferer and blocker levels are used, corresponding to a Bluetooth, UMTS, and WLAN scenario, respectively. This causes the front-end to reduce its LNA gain at higher input power levels with sometimes a resulting small dip in the SNR. At high input powers distortion is the limit. The measured receiver noise figure, gain and IIP3 as a function of the RF frequency are shown in Fig. 3.21 for a channel bandwidth of 20 MHz. Typical noise figure is around 5 dB up to 5 GHz carrier frequency, above which the performance degrades due to insufficient LO signal swing. The input IP3 varies from −8 to −4 dBm over the frequency range.
Fig. 3.19 SDR chip microphotograph
60
3 Software-Defined Radio Front-Ends 50
SNDR [dB]
40 30 20 10 BW=500kHz BW=2.2MHz BW=10MHz
0 −10
−100 −90 −80 −70 −60 −50 −40 −30 −20 −10
Input power [dBm]
IIP3 [dBm]
10
50
5
40
0
30
-5
20
-10
10
-15
0
1
2
3
4
5
Gain [dB]
NF [dB10]
Fig. 3.20 RX chain radio budget measurement
0 6
Frequency [GHz] Fig. 3.21 SDR receiver performance
A measured 64QAM OFDM constellation and corresponding output spectrum are presented in Fig. 3.22. They correspond with an EVM of −29.5 dB for an output power of −0.5 dBm at 2.45 GHz. Performance varies of course over carrier frequency and chosen operation point, power can be saved at the expense of reduced linearity and degraded EVM. Table 3.1 gives a summary of the power consumption and performance of the various SDR circuits for various operating modes.
3.8 Conclusions
61 -10 -20 -30 -40 -50 -60 -70 -80 2.4
2.45
2.5
Fig. 3.22 SDR TX constellation and output spectrum at 2.45GHz (Pout = −0.5 dBm; EVM = −29.5 dB) Power supply = 1.2 V Current Receiver LNA Mixer LO buffer LPF VGA DMQx5, /4 @ 4.9 GHz x3, /4 @ 3.0 GHz x3, /8 @ 1.5 GHz IQ @ 4 GHz
[mA] 8/12 2x5/9/12 2x3/4/7 2x0.3/10 2x1/6
(min/typ/max) Transmitter LPF Mixer PPA
24 16.5 16.5 14.3
DMQ cont. /4 @ 1 GHz /8 @ 500 MHz /16 @ 250 MHz /32 @ 125 MHz
/2 @ 2 GHz 6.8 Receiver performance NF°(high gain) 4.8 – 8.5 dB IIP3°(low gain) −8.2 .. −3 dBm Gain 10 .. 90 dB Freq. range 0.1 .. 6 GHz Cut-off freq. 0.35 .. 23 MHz Current 27 .. 82 mA °fRF 100 MHz .. 6 GHz Channel BW 20 MHz
Spur. tones x3/4 mode Transmitter perf. P1dB OIP3 Pout (WLAN 64 QAM) −0.5 dBm @ 2.45 GHz −3.1 dBm @ 4.9 GHz −6.2 dBm @ 5.24 GHz −0.6 dBm @ 2.45 GHz −2.3 dBm @ 2.45 GHz Gain ctrl . @ 2.45 GHz Mixer PPA
[mA] 2x4 2 .6 .. 5 .9 2 5 .. 5 1
4.8 4.5 4.4 4.4 < −30dBc 2.45GHz 5.8 dBm 15.5 dBm EVM −29.5 dB −29.2 dB −30.0 dB −26.5 dB −30.6 dB Range 20 dB 43 dB
4.9GHz 1 dBm 12 dBm Ivdd 51 51 51 36 36 Step ~5 dB <2 dB
Table 3.1 SDR performance overview
3.8 Conclusions In this chapter, the basic architecture and implementation concepts of a SoftwareDefined Radio analog front-end have been presented. A direct-conversion transceiver with very flexible and reconfigurable building blocks has been analyzed
62
3 Software-Defined Radio Front-Ends
at system level, in order to be able to cover the requirements imposed by a large set of communication standards in cellular, WLAN, WPAN, broadcasting and positioning applications. An important aspect of every SDR front-end is the LO synthesis. Since many RF frequency bands need to be covered, ranging from 174 MHz up to 6 GHz, it is a very complex task to generate all local oscillator signals. A very wideband and flexible VCO has been demonstrated, which in combination with division and multiplication in quadrature makes this feasible. Several innovations in the various building blocks of the receive and transmit chain are also needed to achieve full SDR functionality. Some examples include the use of a MEMS switch in the input stage of an LNA, wideband LNAs, an ultra-flexible baseband channel select filter, an innovative charge-sharing successive approximation ADC, and a programmable power amplifier driver. A versatile RX and TX path with programmability to address the various functional requirements of many different standards can offer simultaneously an optimal power consumption by trading in unnecessary performance at runtime. All these concepts are integrated in the world’s first true SDR prototype, achieving good performance combined with extensive programmability. Although several other improvements will still be needed, the presented work has already taken an important step towards a true energy-efficient multi-mode Software-Defined Radio. Acknowledgements The author would like to thank all the members of the RF design team in the Wireless Research Group in IMEC who have made the SDR project possible. These are Boris Cˆome, Bj¨orn Debaillie, Vito Giannini, Michael Goffioul, Dries Hauspie, Mark Ingels, Michael Libois, Mingxu Liu, Pierluigi Nuzzo, Charlotte Soens, Geert Van der Plas, Gerd Vandersteen, Joris Van Driessche, and Piet Wambacq. The results on the baseband analog circuits have been obtained in a cooperation with prof. Andrea Baschirotto, Univ. of Salento, Italy. This research has been carried out in the context of IMEC’s multimode multimedia program which is partly sponsored by Samsung.
References 1. Andreani, P., Fard, A.: More on the 1/ f 2 phase noise performance of CMOS differential-pair LC-tank oscillators. IEEE Journal of Solid-State Circuits 41(12), 2703–2712 (2006) 2. Andreani, P., Mattisson, S.: On the use of MOS varactors in RF VCO’s. IEEE Journal of Solid-State Circuits 35(6), 905–910 (2000) 3. Bevilacqua, A., Niknejad, A.M.: An ultrawideband CMOS low-noise amplifier for 3.1–10.6 GHz wireless receivers. IEEE Journal of Solid-State Circuits 39(12), 2259–2268 (2004) 4. Bruccoleri, F., Klumperink, E., Nauta, B.: Wide-band CMOS low-noise amplifier exploiting thermal noise canceling. IEEE Journal of Solid-State Circuits 39(2), 275–282 (2004) 5. Cˆome, B., Hauspie, D., Albasini, G., Brebels, S., De Raedt, W., Diels, W., Eberle, W., Minami, H., Ryckaert, J., Tubbax, J., Donnay, S.: Single-package direct-conversion receiver for 802.11a wireless LAN enhanced with fast converging digital compensation techniques. In: IEEE MTT-S International Microwave Symposium Digest, pp. 555–558 (2004), Fort Worth, TX
References
63
6. Craninckx, J., Van der Plas, G.: A 65 f J/conversion-step, 0–50 MS/s 00.7 mW 9 bit charge sharing SAR ADC in 90 nm digital CMOS. In: Technical Digest IEEE International SolidState Circuits Conference, pp. 246–247. San Francisco, CA (2007) 7. Craninckx, J., Steyaert, M.: A 1.8 GHz low-phase noise CMOS VCO using optimized hollow inductors. IEEE Journal of Solid-State Circuits 32(4), 736–744 (1997) 8. Craninckx, J., Steyaert, M.: Wireless CMOS Frequency Synthesizer Design. Kluwer, Dordrecht, The Netherlands (1998) 9. Craninckx, J., et al.: A fully reconfigurable software-defined radio transceiver in 0.13 μ m CMOS. In: Technical Digest IEEE International Solid-State Circuits Conference, pp. 346–347. San Francisco, CA (2007) 10. D’Amico, S., Giannini, V., Baschirotto, A.: A 4th-order Active-Gm-RC reconfigurable (UMTS/WLAN) filter. IEEE Journal of Solid-State Circuits 41(7), 1630–1637 (2006) 11. Darabi, H., Abidi, A.A.: Noise in RF-CMOS mixers: A simple physical model. IEEE Journal of Solid-State Circuits 35(1), 15–25 (2000) 12. Debaillie, B., Van Wesemael, P., Craninckx, J.: Calibration method for RF imperfections enabling low-cost SDR IEEE International Conference on Communications 2008, pp. 4899– 4903, Bejing, China 13. Giannini, V., Craninckx, J., D’Amico, S., Baschirotto, A.: Flexible baseband analog circuits for software-defined radio front-ends. IEEE Journal of Solid-State Circuits 42(7), 1501–1512 (2007) 14. Goffioul, M., Vandersteen, G., Van Driessche, J., Debaillie, B., Cˆome, B.: Ensuring consistency during front-end design using an ogject-oriented interfacing tool called NETLISP. In: Design Automation Conference, pp. 889–892. San Francisco, CA (2006) 15. Haspeshgh, D., et al.: BBTRX: A baseband transceiver for a zero IF GSM hand portable station. In: IEEE Custom Integrated Circuits Conference Boston, MA (1992) 16. Hauspie, D., Park, E.C., Craninckx, J.: Wideband VCO with simultaneous switching of frequency band, active core and varactor size. IEEE Journal of Solid-State Circuits 42(7), 1472–1480 (2007) 17. Hegazi, E., Sj¨oland, H., Abidi, A.: A filtering technique to lower LC oscillator phase noise. IEEE Journal of Solid-State Circuits 36(12), 1921–1930 (2001) 18. Ingels, M., Soens, C., Craninckx, J., Giannini, V., Kim, T., Debaillie, B., Libois, M., Goffioul, M., Van Driessche, J.: A CMOS 100 MHz to 6 GHz software defined radio analog front-end with integrated pre-power amplifier. In: Proceedings of the IEEE European SolidState Circuits Conference, pp. 436–436. Munich, Germany (2007) 19. Liu, M., et al.: MEMS-enabled dual-band 1.8 & 5–6 GHz receiver RF front-end. In: IEEE Radio and Wireless Symposium, pp. 547–550. Long Beach, CA, USA (2007) 20. Manetakis, K., Jessie, D., Narathong, C.: A CMOS VCO with 48% tuning range for modern broadband systems. In: IEEE Custom Integrated Circuits Conference, pp. 265–268. Orlando, FL (2004) 21. McCreary, J., Gray, P.: All-MOS charge redistribution analog-to-digital conversion techniques – part I. IEEE Journal of Solid-State circuits 10(6), 371–379 (1975) 22. Nguyen, C.: Integrated micromechanical circuits for rf front ends. In: Proceedings of the IEEE European Solid-State Circuits Conference, pp. 7–16. Montreux, Switzerland (2006) 23. Van der Plas, G., Decoutere, S., Donnay, S.: A 0.16 pJ/conversion-step 2.5 mW 1.25GS/s 4b ADC in a 90nm digital CMOS process. In: Technical Digest IEEE International Solid-State Circuits Conference, pp. 566–567. San Francisco, CA (2006) 24. Razavi, B.: RF Microelectronics. Prentice Hall Upper Saddle River, NJ (1998) 25. Santos, H.D.L., et al.: RF MEMS for ubiquitous wireless connectivity: Part ii – application. IEEE Microwave Magazine 5(4), 50–65 (2004) 26. Schuchert, A., Hasholzner, R., Antoine, P.: A novel IQ imbalance compensation scheme for the reception of ofdm signals. IEEE Transactions on Consumer Electronics 47, 313–318 (2001) 27. TeraVicta Technologies: TT712-68CSP SPDT 7GHz RF MEMS Switch. [Online] http://www.teravicta.com
64
3 Software-Defined Radio Front-Ends
28. Tiebout, M.: A CMOS fully integrated 1 GHz and 2 GHz dual band VCO with voltage controlled inductor. In: European Solid-State Circuits Conference, pp. 799–802 (2002), Florence, Italy 29. Tubbax, J., Cˆome, B., Van der Perre, L., Donnay, S., Engels, M., De Man, H., Moonen, M.: Compensation of IQ imbalance and phase noise in OFDM systems. IEEE Transactions on Wireless Communications 4(3), 872–877 (2005) 30. Valkama, M., Renfors, M., Koivunen, V.: Advanced methods for I/Q imbalance compensation in communication receivers. IEEE Transactions on Signal Processing 49, 2335–2344 (2001) 31. Van Driessche, J., Craninckx, J., Cˆome, B.: Analysis and key specifications of a novel frequency synthesizer architecture for multi-standard transceivers. In: IEEE Radio and Wireless Symposium, pp. 481–484 (2006), San Diego, CA 32. Vaucher, C., Ferencic, I., Locher, M., Sedvallson, S., Voegeli, U., Wang, Z.: A family of lowpower truly modular programmable dividers in standard 0.35 μ m CMOS technology. IEEE Journal of Solid-State Circuits 35(7), 1039–1045 (2000) 33. Zhan, J.H., Taylor, S.: An inductor-less broadband LNA with gain step. In: Proceedings of the IEEE European Solid-State Circuits Conference, pp. 344–347. Montreux, Switzerland (2006)
Chapter 4
SDR Baseband Platforms Opportunism to Combine Flexibility and Low Energy
4.1 SDR Baseband Platforms: Going Mobile New wireless standards and updates rather on a monthly than on a yearly basis, while wireless network operators wish to keep the same infrastructure in place rather for decades than for years. Hence, the motivation to enable ‘on site’ upgrades is high. In the past years, flexible baseband solutions (amongst others based on FPGAs) and lately also first generation SDR platforms have turned up in base stations. Cellular base stations traditionally being equipped with expensive power devices, the overhead coming with reconfigurability in the digital processing was hardly noticed in the (cost and power) budget. With the increasing need for flexibility at the terminal side (both for functional and for cost reasons), it is a logical evolution to also introduce reconfigurable solutions at the terminal side [2,9,19,25,26]. For SDR baseband platforms to go mobile however, important power bottlenecks have to be overcome! As a crucial strategy to combine flexibility and low energy, opportunism by applying a ‘divide and conquer’ partitioning on the platform is advised. This tactic is explained in Section 4.2 along with the low power operation of such platform, and a suitable silicon implementation. As an exemplary case, a heterogeneous MPSoC platform conceived based on SDR-optimized reconfigurable cores, for which a prototype design has been realized at IMEC, is presented. In the following sections, the key components of an SDR baseband platform featuring innovative reconfiguration possibilities are discussed in more detail: a digital front-end, a baseband processor, and a flexible Forward Error Correction (FEC) engine.
L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
65
66
4 SDR Baseband Platforms
4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer 4.2.1 Opportunistic Partitioning
Need in flexibility
When considering mobile reconfigurable digital baseband platforms, one has to carefully trade-off flexibility and energy efficiency. To achieve an acceptable energy efficiency for mobile devices (100 MOPS/mW on the average), flexibility should only be introduced in the platform where its impact on the total energy cost is marginal or where it enables control options that can be effectively exploited in the control step (targeted flexibility). Indeed, flexibility offered by fine-grained adaptive algorithms and implementations may, at system-level, be more energy-efficient than fixed non-adaptive hardware solutions. This comes from the fact that these flexible solutions have the potential to continuously adapt to the environment and application dynamics for saving energy. The functional analysis of the tasks performed in baseband digital signal processing has revealed a clear diversity of computational load and flexibility requirements (see Fig. 4.1). This heterogeneity of requirements advocates a ‘divide and conquer’ approach: a heterogeneous Multi-Processor System-on-Chip (MPSOC) implementation is best suited to combine the flexibility within low energy constraints [27]. Following this approach, a high-performance SDR MPSOC platform was conceived [22] with the goal to support the recent broadband wireless standards, to support amongst others WLAN (IEEE802.11 family), WMAN (WiMAX), 3GPPLTE, and digital broadcasting (which comes in different regional variations: DMB, ISDB-T, DVB-h) [1]. The high-level goals are indicated in Fig. 4.2. The platform builds around a Central Processing Unit (CPU) to which heterogeneous slave Processing Entities are appended. The processing entities are designed
Modulation Demodulation (Inner Modem) Synchronization Forward Error Correction (Outer Modem)
FE steering Signal detection
Need in energy efficiency Fig. 4.1 Computational and flexibility requirements of different baseband processing functions
4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer Fig. 4.2 SDR baseband platform case: high-level targets
802.11n
802.16e
67
3GPP-LTE
DVB
• Up to 3 antennas • Up to 200Mbps • <500mW • <10mW in Idle
Flexible platform
Fig. 4.3 Heterogeneous MPSoC approach for mobile SDR baseband platform: typical platform template
keeping in mind the heterogeneous requirements as well as the most important characteristics of wireless baseband processing. A typical platform template is depicted in Fig. 4.3. The different components and their function are summarized in the list below, and will be further elaborated in the following sections: – A platform controller is responsible for the MAC functionality and the PHY macro-pipeline scheduling. This typically runs on a RISC processor (ARM [18] in the case represented in Fig. 4.3). – Digital front-end (DFE) tiles achieve packet detection with limited need in flexibility and code mapping productivity (<5% of the total application code) but very high energy efficiency requirements. They are heavily exploiting sub-word parallelism and techniques to reduce the instruction and data memory overhead (see Section 4.3). – Baseband engines handle the burst processing based on the combination of the aforementioned architectures (CGA processor coupled with vector processing), which provides an appropriate trade-off between performance, energy efficiency and flexibility (see Section 4.4).
68
4 SDR Baseband Platforms
– Forward error correction (FEC) cores are typically implemented on reconfigurable hardware components. In a first generation mobile SDR baseband platforms this design decision was valid based on the low need for flexibility, combined with a huge complexity if implemented on a generic DSP (see Fig. 4.1). However, in recent and emerging wireless standards, an evolution towards more flexibility in the FEC as well is observed. As a consequence, an applicationspecific architecture for advanced FEC algorithms (turbo- and LDPC-codes) is desired for future SDR baseband platforms. FEC decoding algorithms are indeed hard to map to typical baseband processors (mainly because of the extensive interleaving requirements). They also require an effective solution for the severe data access bottleneck (see Section 4.5). – On the considered platform case, the cores are connected by a segmented (AMBA) bus. This bus has been designed to achieve the required bandwidth while avoiding significant over-dimension, building on the HW-SW co-design approach (see Section 4.2.3 and Chapter 5). In the future with increasing bandwidth and flexibility requirements, a Network-on-Chip (NoC) may better fit the needs.
4.2.2 Low Power Operation: Sleeping, Waking, and Working on Minimal Energy Analysis of typical wireless modem use cases shows that the power consumption in idle time often dominates battery life time. In typical present cellular and WLAN scenarios, the wireless modem is actually transmitting and receiving less than 5% of the time, and consequently power consumption in idle mode plays a dominant role in battery time. Taking into account this manifest characteristic, overall average low power operation can be achieved by applying a hierarchical wake-up approach. The concept consists in gradually enabling more power-consuming parts as the chance of a valid signal reception increases (as presented in Fig. 4.4).
Probability of signal detection
Power consumption
Fig. 4.4 Low power operation: hierarchical wake-up
off
AGC
Time sync
Packet decoding
69
% platform activated
4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer
Power consumption
Duty cycle
Fig. 4.5 Duty cycling vs. flexibility and power consumption
The concept of wake-up radio exploits the heterogeneous partitioning of the platform. For most of the time, only the AGC circuitry is active. The more flexible components on the platform, which are responsible for power overhead, are only powered on when the preceding stage has confirmed the need. Consequently major power hungry segments of the platform can be heavily duty cycled. The inverse relation between duty cycle on the one hand, and flexibility and power consumption on the other hand, is illustrated above (Fig. 4.5). As a result, very low idle power consumption is achieved, and the overall energy needs and thus the battery time of the mobile device are preserved. Indeed a power consumption in the same order as dedicated solutions, and considerably lower than the leakage power of the total platform (see Section 4.2.3), can be achieved. In order to clarify the typical operation of a heterogeneous SDR platform in its different working modes, a short explanation is given below. More details on the specifics of the different components follow in the next sections. • In idle mode, only the DFE Rx of the platform is operational. Upon detection of signal at its input by the AGC, the ‘SyncPro’ (see Section 4.3.2) will attempt initial synchronization, and in case a valid signal is detected the more power consuming rest of the platform is then activated. The ARM will start controlling the communication on the platform, in order to receive data and communicate according to the MAC protocol. • The ARM as a central controller sets up DMA transfers between the different components on the platform, receives interrupts when tasks are finalized and reacts upon them. • In transmit mode, the following sequence is set up: data is received through the host interface. The data is encoded, consequently modulated by the baseband engine, and finally transmitted to the DAC after up-sampling filtering in the DFE. • In receive mode, the data received from the ADC by the DFE and down-sampled, consequently it is demodulated in the baseband engine, and decoded in the outer modem, from which it is transferred to the host interface.
70
4 SDR Baseband Platforms
In data transmission and reception mode, energy consumption is mainly determined by the activity of the baseband engine, which is by large the most power hungry component on the platform. A thorough architecture-algorithm co-design for the baseband engine, exploring energy vs. performance, is essential (see Section 4.4). The actual activity of the baseband engine however in the end, is largely determined by the software running on it, and the degree to which it has been optimized. We refer to Chapter 5, where challenges and solutions concerning efficient SW design and SW design efficiency is elaborated on.
4.2.3 SDR Platform Implementation: Teaming up with Deep-Submicron Technology 4.2.3.1 Advanced Scaling: What it (Does Not) Offer for SDR Baseband Platforms Starting from a suitable heterogeneous MPSoC architecture, an actual attractive SDR platform implementation can be achieved thanks to the possibilities offered by silicon technology. Indeed, a small area (and thus low cost) can be achieved thanks to the reuse of hardware both for different standards, and within one standard by using the same processor for the different functions in the digital transmitter and receiver. The need to run this hardware at a higher clock speed then dedicated hardware is fulfilled thanks to the high speed transistors available in deep-submicron technology nodes. Until recently, technology scaling steadily offered ‘more (functionality/complexity) for less (area/cost and power)’. CMOS scaling has arrived at the point where parasitic problems are becoming dominant: variability, reliability, and last but not least leakage, can not be resolved anymore at the technology (transistor) or circuit level only, and will have their impact on the system level performances. Importantly for digital processing specifically, the leakage may become the dominant component in power consumption, and has to be tackled thoroughly in order to provide energy efficient solutions in the future. As this detrimental effect can not be handled anymore at transistor level, appropriate measures at design and at system level are needed. At the system level, the hierarchical activation in the SDR baseband platform operation, and the intelligent cross-layer control considering total energy, serve this purpose. The design approach is explained in the following section.
4.2.3.2 Designing for High Performance and Low Power: A Challenge at Expert Level To design for high performance and low power, one should try to benefit from the great opportunities offered by the newest technologies, without paying for the shortcomings. This asks for great care for the substantial and aggravating parasitic
4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer
71
effects. In order to cope with these effects, new methodologies have to be followed, and specific techniques have to be applied. In digital design, these read mainly in the following categories: – Memory reliability and yield problems: Memory Test and Repair solutions can be put in place. – Front-end-of-line variability (especially, variability on the threshold voltage): Statistical static timing analysis (SSTA) is usually considered to avoid over-design resulting from the traditional design-corner approach for timing closure. – Leakage: Fine-grained voltage island definition with, possibly, substrate biasing can be applied. – Back-end-of-line and lithography variability (including voltage drop): A Design For Manufacturing (DFM) methodology is needed. For the MPSoC for SDR baseband case, a prototype design was realized in a 90 n nm General Purpose (GP) technology. The GP technology variety was selected over the Low Power (LP) version because of its high frequency possibilities, at the expense of a potential considerable leakage power. An important measure to control leakage power is the insertion of power islands. These islands allow to completely power off segments of the platform when their operation is not needed. In the exemplary MPSoC case, four islands were defined, as illustrated in Fig. 4.6. Islands 1 and 2 cover the two baseband engines respectively, island 3 contains the DFE, and island 0 holds all the remainders, including for example the ARM and the AMBA subsystem. Two temperature monitors were inserted in the layout, in order to experimentally verify the thermal variance of leakage power. At the hot spots of the chip, there exists a risk for thermal runaway, whereby increasing temperature increase the leakage, in its turn increasing the temperature, and so further. This effect is difficult to predict
Fig. 4.6 Insertion of power islands in MPSoC in order to manage leakage power
72
4 SDR Baseband Platforms
pre-tape-out with current state-of-the-art tools. While fortunately for this prototype no thermal problems occur, it is an important concern for future designs and still smaller technologies!
4.2.3.3 A Prototype Proving the Concepts: The BEAR Chip A prototype design MPSoC chip for the proposed exemplary heterogeneous SDR baseband platform was realized at IMEC, in order to proof the concepts [33]. The chip was called ‘BEAR’ which stands for ‘Baseband Engine for Adaptive Radio’. A high-level schematic view on the chip’s architecture is given in Fig. 4.7. The DFE and baseband engine (ADRES) architectures and designs will be further explained in following sections. Please note that the size of the different components in the schematic is not representative for the physical sizes (which can be roughly interpreted based on the indication of the voltage islands depicted in Fig. 4.6). The most important facts and figures of the chip, based on data from tape-out, are summarized in Table 4.1. Importantly, when compared to dedicated chips for wireless modems, we can conclude that the extra area cost is paid back already when considering two standards. More importantly possibly, the large degree of flexibility of the platform enables updates and upgrades of the wireless functionality. This can bring considerable competitive advantages design cost and in Time To Market (TTM).
FIFO FIFO
FEC FEC AHB AHB interface interface
FIFO
DFE tile ctrl ctrl
DCO comp
DFE
DCO comp
FIFO
Viterbi Decode Decode Descrambl Descrambl CRC
IMEM DMEM DMEM
ADRES IMEM
IMEM I buffer
Q buffer
timers GPIO
ADRES I$
I buffer
Q buffer
ADRES core
DMEM
APB bridge
DMA1
DMA2
DFE AHB interface interface
L1 L1 bk0 bk0
L1 L1 bk1 bk1
L1 bk2 bk2
L1 L1 bk3 bk3
APB APB Interrupt Interrupt ctrl ITCM DTCM DTCM
Baseband Engines
Detect Detect engine engine
ARM subsystem
CRC CRC Scrambl Conv Conv encode encode
Outer Modem
Front-end interface
ADRES ADRES AHB AHB interface interface
2-layer AHB ARM9 Cell
Boot RAM
Host SPI interface Host IO Host interface
Fig. 4.7 High-level schematic of BEAR chip
Clock Clock & & Power Power Mngnt
Test Test interface
L2 RAM
4.2 Approach to Combine Flexibility and Low Energy: Divide and Conquer
73
Table 4.1 Main facts and figures of BEAR chip • TSMC 90G Multi-VT technology • 32 mm2 core die area, 20 mm2 active area • 6.7 Mbit memory (121 instances) • 4 power domains, eight clocks • 270 I/O pins, 267 supply pins • 2 ADRES processors, each with: • 33 memory macro @ 400 MHz • 32 KB instruction cache • 128-entries config mem • 64 KB data scratchpad • 128 KB IMEM @ 200 MHz • 400 MHz clock rate worst case commercial operating conditions • 25 GOps
Fig. 4.8 IMEC’s BEAR chip on test PCB
Researchers have integrated the BEAR on a PCB (see Fig. 4.8), and successfully proven that the silicon is functional only 2 weeks after chips came back from the fab. Electrical tests and first communication with ARM and all other components was set up. The baseband code for the wireless LAN (IEE802.11a-like) air interface was downloaded to the chip. The TX processing running on one of the chip’s baseband processors creating output samples were at first instance looped back into the chip towards the second baseband engine doing the RX processing. Meanwhile, the baseband set up has been connected to a reconfigurable front-end chip (SCALDIO, see Chapter 3), and a successful wireless transmission of WLAN has been achieved. At the time of writing of this book, further extensions of the wireless prototype to MIMO WLAN and to other standards, and test and measurements (a.o. of both active and leakage power in different operation conditions) of the chip are on-going.
74
4 SDR Baseband Platforms
4.3 Digital Front-End: Going Reactive and Cognitive Programmable solutions intrinsically suffer from higher power consumption when compared to dedicated solutions. In the considered platform, the digital front-end tiles, which implement signal detection and pre-synchronization functions, require both very high energy efficiency and sufficient programmability to implement detection of different standards on the same tile. The key to this combination lies in the hierarchical power up of the DFE functionality. An architecture that enables such hierarchical wake-up [30] is explained in more detail this section.
4.3.1 The Global DFE: Speaking and Listening Means for the Baseband Platform The DFE consists of multiple ‘tiles’ (three in the considered case, as represented in Fig. 4.9). A single tile contains the digital receive and transmit logic to interface to a single antenna. The transmitter part of a DFE tile consists of a buffer and a VLSI interpolation filter. The interpolation filter is based on an optimized implementation of a 19-taps
Power Power detector detector
AGC processor
SyncPro
Pre-sync engine RX I buffer
PIC16F84 compatible
FE FE control control interface interface
DCO compensation
TX I buffer
TX Q buffer
RX Q buffer Q
CFO compensation AHB slave
Configuration
Fig. 4.9 Digital front end architecture
DFE tiles
4.3 Digital Front-End: Going Reactive and Cognitive
75
half-band filter (with hamming window) for a fixed up-sampling of factor 2. A start command can be issued allowing the samples to be clocked out towards the analog front-end through the filters. The transmit (TX) buffers have a programmable threshold that triggers an interrupt once the number of available samples falls below this threshold. This interrupt is handled by the platform controller. The receiver part of a DFE tile contains a chain made of the VLSI decimation filters, the buffers and compensation units for DC offset and carrier frequency offset (CFO). The decimation filter impulse response is derived from a 19-taps half-band filter with hamming window performing an energy efficient factor 2 down-sampling. Next to the data path, two dedicated micro-processor cores are implemented. The first handles the front-end automatic gain control (AGC) and the DFE power management. The second core is optimized for time synchronization. The components of the DFE cooperate in the hierarchical wake up approach according to the state machine depicted in Fig. 4.10. Important data on the above figure are the indications on the power consumption in the different states. Clearly significant power saving is achieved by residing in state 1 in idle state, and only switching the synchronization engine on upon power detection by the AGC. State 2 is an intermediate state, which will decide either to proceed to packet decoding, or if no synchronization is found within a predefined time limit to return to idle mode. The operation of the DFE can be interpreted as ‘reacting’ on incoming data and events. The resulting ‘reactive radio’ is an important step towards ‘cognitive radio’ (see Chapter 7).
AGC_done
1
2 AGC on Filter on FIFO on Sync_pro on Bus interface off
AGC on Filter off FIFO off Sync_pro off Bus interface off Sync_timeout
22.9mW
Sy
nc
1.1mW
AGC on Filter on FIFO on Sync_pro off Bus interface off
3 Fig. 4.10 DFE state machine, indicating power consumption in different states
76
4 SDR Baseband Platforms
4.3.2 Zooming in on the Power Detection and AGC Controller A power detection unit with variable delay line of 8 or 32 samples determines the received signal power and is capable of DC offset estimation. A dedicated microcontroller is used to implement the AGC algorithm that removes DC offset and optimizes the ADC range based on the analog front-end control pins. The controller also determines which other parts of the DFE RX are activated after an AGC event is detected (gradual wakeup). The controller architecture is depicted in Fig. 4.11. It is clocked at the sample rate (40 MHz). It has an instruction memory of 512 14bit words and a data scratchpad of 32 8-bit words, both implemented as register file macros. The instruction set and the architecture of the controller are compatible with an industry-standard Microchip PIC16F84 [32]. This allows for reuse of the available tools, including c-compilers and debuggers.
4.3.3 Zooming in on the Synchronization Engine 4.3.3.1 Defining an Optimized ASIP Architecture A dedicated application specific processor (ASIP) as ‘synchronization engine’ for the platform was designed to perform signal detection and coarse time synchronization [30, 31]. The prime target applications considered for the design of the ASIP are time synchronization for 802.11a and 802.16e. Besides, the engine is designed to be flexible enough to support future standards such as 3GPP LTE. Time Synchronization usually consists of two parts: 1. First one correlates over the input signals to expose the periodic structure of a packet preamble.
I
Q
Power detection DCO calculation
Timer
PIC Compatible controller
Power control IO’s
DCO comp
DCO comp
Fig. 4.11 Power detection and AGC controller
Frontend interface
4.3 Digital Front-End: Going Reactive and Cognitive
77
2. Then, one scans the correlation results for a peak above a certain threshold. The index of the peak indicates the start of the packet. The code consists of computation intensive and control intensive parts. To implement this kind of algorithms, an ASIP (Fig. 4.12) with specialized instruction set was developed, featuring four types of instructions: Scalar instructions for control, vector instructions for computation intensive tasks and instructions to generate and evaluate vectors. In the conception of the ASIP architecture, energy efficiency was considered a key design target. Therefore, special attention was paid to the selection of the instruction-set, parallelization, storage elements (register files, memories) and interconnect. The resulting ASIP architecture is a clustered VLIW processor with scalar and vector slots, as shown in Fig. 4.12. It was modeled this with LISATek 2005.2.1 [11, 16, 31]. The processor has five issue slots organized in four clusters. The two slots in the first cluster operate on 16 bit data and the three other clusters on 128 bit wide vectors. Each of the vector clusters contains a vector register file. The vector register files have three read and one write port. Two of the read ports are dedicated to the FUs in the particular slot and one is used for operand broadcasting. For operand broadcasting, we have implemented a very flexible software controlled interconnect.
Fig. 4.12 Synchronization ASIP
78
4 SDR Baseband Platforms
4.3.3.2 Realizing a Power Efficient Implementation The proposed processor architecture has been implemented in a 90 nm CMOS technology. Therefore, two major steps have been carried out. First, the processor was modeled in LISA (Language for Instruction-Set Architecture), capitalizing on the Processor DesignerTM toolsuite from Coware [16], which is a tool-suite for automated embedded processor design. Our motivation for using it is that it enables the generation of software development tools, such as assembler, linker and instructionset simulator very early in the design process. Moreover, the tools offer strong support for platform integration (by generating wrapper for SystemC-based virtual platform modeling) and good-quality automated RTL code generation. Then, RTL code is generated and used as a starting point for logic synthesis. Consequently, the processor micro-architecture can be co-optimized with the kernel software. An architecture-algorithm co-design is so possible, whereby the actual functionality (code) is used to estimate the eventual energy that will be used. The flow is shown in Fig. 4.13. We start with describing the instruction-set architecture in LISA. The resulting model is then iteratively refined into a pipelined micro-architecture representation, from which RTL code can be generated. Software and hardware are developed in parallel. The tools offer profiling functions, enabling fast application and architecture directed feedback, based on information about cycle count, IPC, code coverage and resource utilization. As soon the targets on those high level figures are met, the exploration is extended to architecture implementation level. Implementation cost assessment requires the knowledge of silicon area, achievable clock rate and power consumption. Therefore, a VHDL description is generated from Processor Designer and synthesized using a logic synthesis tool-chain. Compilation and optimization of the design are done with the Physical CompilerTM environment from Synopsys. For place & route and clock tree insertion, Cadence SOCEncounterTM was used. Using the described flow, the design was synthesized for a 90 nm general purpose process with a standard cell library for nominal Vt . We target a clock rate of 200 MHz under worst case operating conditions (VDD = 0.9 V, T = 125◦C). Postlayout timing is back annotated proving 200 MHz operation. The total processor area is 0.8 mm2 , including the 3 KB program memory (0.06 mm2 ) and 4 KB data
Fig. 4.13 Architecture-algorithm co-design to optimize energy efficiency
4.3 Digital Front-End: Going Reactive and Cognitive
79
memory (0.07 mm2 ). Power estimation based on gate-level activity has been done with Synopsys Prime PowerTM . Post-Layout back-annotation was only performed for the final design, to ensure timing closure. To get faster feedback on hardware cost during micro-architecture and implementation refinements, we use physical pre-layout estimates. To obtain feedback on the power consumption, the automatically generated netlist, after preliminary placement and physical synthesis, is simulated with stimuli from the IEEE802.11a synchronization kernel. The optimization target is the reduction of the surface (energy) in the power profile. Therefore, we try to reduce the switching power by operand isolation. Another option is the reduction of the relatively high cell internal power, which can be tackled with clock gating. With the performed optimizations the average power in the IEEE802.11a test bench could be reduced by 64% (see Fig. 4.14). The energy spent for the processing of one input vector of four complex samples could be reduced from 3.97 to 1.3 nJ. First power simulations for the IEEE802.16e synchronization show very similar results for Pkernel and Pwait. However, for the most demanding mode with 20 MHz input rate, we estimate a duty cycle of 85%. Under this assumption, the energy for the computations on one input vector is 3.17 nJ. While clock gating dramatically reduces the cell internal power, the effect of operand isolation on the switching power is negligible. The latter can be explained by the orthogonal instruction-set and the fact that only 18% of the total power is consumed in the datapath (see Fig. 4.15). Biggest power consumers are the flipflops (48%). It is particularly noted that the pipeline registers (23%) consume almost the same share of the total power as the four general purpose register files (25%). Indeed, the design contains a high number of frequently accessed very wide (128 bit)
P [mW] 29.41
PKernal switch.
25.73
Paverage
P [mW]
power
23.46 18.47 Pwait 14.69
13.18 internal power
PKernal switch. power internal power
7.17 leakage 0.5 0
tkernel = 70 titer = 200 power profile initial design & testbench
leakage t [ns]
0.5 0
Paverage Pwait = 1.09mW
tkernel = 70 titer = 200 power profile optimized design & testbench
Fig. 4.14 Synchronization engine power consumption optimization for 802.11a case
t [ns]
80 Fig. 4.15 Power breakdown for the optimized design and test bench
4 SDR Baseband Platforms
Decode, Control, Interconnect
Memories 11%
23%
Datapath 18%
Pipeline Registers 23%
Register Files 25%
pipeline registers, to buffer source operands between the decode (DE) and execute (EX) pipeline stage. Moving the register file read in the execute stage could eliminate those registers. However, it would have dramatic impact on the achievable clock rate. Figure 4.15 gives the power breakdown for the optimized design. Concluding, the ASIP achieves the required performance and flexibility within very tight energy constraints. It can be calculated that the processor delivers a theoretical maximum performance of 5 GOPS (32 bit equivalent) at a peak power of 25 mW. Energy efficiency is hence 200 MOPS/mW (fully loaded).
4.4 Processors for SDR-Baseband: Working Horses in a Race for Speed and Power 4.4.1 The Quest for High Performance and Low Power: Introducing Different Styles Different processor architecture styles have been proposed for SDR modems. Most of them are designed keeping in mind the specific features of physical (PHY) layer signal processing algorithms in wireless standards. We briefly introduce classes of relevant architectures below. By combining these architecture styles, one can obtain a trade-off between energy efficiency, performance and mapping productivity [24]. In order to achieve the required high performance at reasonable energy budget, architecture parallelism must be increased. The resulting architecture executes multiple operations simultaneously, thereby reducing the energy required to perform a single operation. Nonetheless, parallelism can be provided in several ways: 1. Fine-grained reconfigurable arrays (FGA) [6, 13] and coarse-grained reconfigurable arrays (CGA) [10, 15–17, 21] are mainly motivated by the data flow dominance in baseband algorithms. The main bottleneck of the first class is the
4.4 Processors for SDR-Baseband: Working Horses in a Race for Speed and Power
81
high interconnect cost that hampers scalability in deeper submicron technology (65 nm and below) and yields significant energy overhead. The second class improves on this, proposing fewer but more complex functional units. Importantly, C-based compilers for CGA architectures recently became available [7]. 2. Very long instruction word (VLIW) architectures and vector architectures, also referred to as single-instruction multiple-data (SIMD) architectures, are exploiting the high data-level parallelism of typical algorithms for wireless communications [3, 4]. The first class benefits from well established compiler technology giving it an edge in programmer productivity, while the second class improves on energy efficiency. Their combination [5, 20] achieves a tradeoff between productivity and energy efficiency and is very well-fitted for SDR implementations. IMEC’s ADRES processor template has been used to perform an architecture-algorithm exploration and co-design [23, 24], resulting in a suitable baseband engine for an SDR baseband platform. This case is further explained in the following section.
4.4.2 Tuning an ADRES Processor: A Suitable Case 4.4.2.1 The ADRES Framework ADRES is an architecture template that combines a Very Long Instruction Word (VLIW) processor with a coarse-grain array [7]. The VLIW executes control-flow code and the array accelerates data-flow loops. This requires an exploration methodology that guarantees efficient instantiations of the template. The ADRES architecture template (Fig. 4.16) consists of a set of basic components, which includes computational, storage and routing resources. The computational resources are Function Units (FUs) capable of executing a set of word-level operations including custom operations. Register Files (RFs) can be used to store intermediate data. The routing resources include wires, multiplexers and busses. The retargetable C compiler used for the application mapping, called DRESC [8], targets both the VLIW processor and the array. DRESC supports different architectures described on an XML architecture file. Application source code can therefore be compiled directly onto the coarse-grained reconfigurable processor. DRESC exploits loop-level parallelism to achieve high ILP by modulo scheduling, a widely used software pipelining technique [12]. Modulo scheduling executes multiple iterations of a loop in parallel. On the other hand, DLP extraction is not yet automated and needs to be tracked by the programmer through intrinsic C-functions. This approach is similar to most state of the art processors energy-performance exploration of a CGA processor for SDR. We stress the importance of trading off different sources of parallelism, such as data and instruction level parallelism, to achieve the required performance at minimum energy cost.
82
4 SDR Baseband Platforms
Fig. 4.16 ADRES is a tight coupling of VLIW processor and coarse-grained array, exemplary instance
4.4.2.2 Exploration Towards an Optimized SDR-Baseband Processor Architecture ADRES is a template with many parameters to instantiate. For the exploration presented here we only vary the size of the array (ILP exploration) and the width of the SIMD datapath (DLP exploration), as these parameters have the biggest impact in performance and energy consumption of the architecture. Further tuning of the architecture includes for example specification of the interconnect topology. The central kernels from a MIMO-OFDM system for WLAN have been considered as a case for the exploration and the optimization of the processor [24] (Table 4.2). In a CGA, the only way to modify the achievable ILP is by changing the size of the array. The DRESC compiler extracts the ILP from the algorithm and maps it on the architecture using modulo scheduling. Figure 4.17 shows the evolution of the averaged scheduling density (over the considered kernels) while increasing the size of CGA. One can observe that from the 4 × 4 array onwards, the scheduling
4.4 Processors for SDR-Baseband: Working Horses in a Race for Speed and Power
83
Table 4.2 CGA performance Benchmark
Averaged scheduling density
11n 64QAM TX 11n 64QAM RX 11g 64QAM TX 11g 64QAM RX LTE TX
Time in accelerator mode
IPC
64% 56% 53% 56% 99%
10.75 9.99 11.05 10.37 8.76
100
75 SIMD4 SIMD2 SISD
50
25
0 3⫻3
4⫻4
6⫻6
8⫻8
CGA sizes Fig. 4.17 ILP exploration
density starts to decay drastically. This is because by increasing the size, the scheduling problem becomes more complex and the risk of finding a suboptimal schedule increases. Moreover, for a given algorithm, the amount of instructions that can be scheduled in parallel is limited. We also observe that for the SISD (Single Instruction, Single Data) case, the decay of the scheduling density is softer than for the other curves. This is because the compile can extract ILP out of the DLP present in the algorithm. As soon as the DLP is exploited (SIMD2 or SIMD4), the number of independent operations that can be scheduled in parallel is further reduced and hence the scheduling density experiences a more pronounce decay. The DLP extraction fully relies on the programmer, which models the vector operations throw intrinsic C-functions. We select 16 bits as subword length since they provide enough precision to accommodate the targeted processing. For the DLP exploration, we instantiate three different degrees of data parallelism: • SISD: Single instruction operates on single subword. This results in a 16 bit architecture. • SIMD2: Single instruction operates on two subwords, real and imaginary part of a complex word. This results in a 32 bit architecture. • SIMD4: Single instruction operates on four subwords, real and imaginary part of two complex words. This results in a 64 bit architecture.
84
4 SDR Baseband Platforms
Most of the baseband processing operates on complex data (I and Q data). Thus we upgrade the Instruction Set Architecture (ISA) of the SIMD2 and SIMD4 architectures to support complex arithmetic. This simplifies the application dataflow reducing the required number of instructions and therefore simplifying the scheduling problem. In order to choose the most suitable architecture, a trade off between performance and cost is needed. In the exploration at hand, the performance is expressed in the absolute time required for the processing of an OFDM symbol. The latter should be, in average, below 4μs to guarantee real-time demodulation. The cost axis corresponds to the energy consumed in the processing of a single OFDM symbol. In the space defined by these two axes the points of the considered CGA instances have been plotted (Fig. 4.18). Two representations for every architecture instance are present, one considering the processing of a single OFDM symbol (solid line) and a second one considering the processing of four symbols in parallel (dashed line). At run-time, when the conditions allow long latency, the processor can drastically improve its energy efficiency moving from the solid working point to the dashed one. We represent with an identical marker the points that belong to the same array size and vary only the width of the datapath (from left to right: SIMD4, SIMD2 and SISD). The 3 × 3 and 4 × 4 SIMD4 instances are optimal pareto points since they offer different energy-performance trade offs. The 3 × 3 SIMD4 instance consumes less energy than the 4 × 4 SIMD but it does not meet the performance requirements. For the considered case the 4 × 4 SIMD4 instance is selected as the optimal architecture because this also offers real-time processing when demodulating a single OFDM symbol (hard latency constrains). The latter can execute the symbol-based baseband processing of a 108 Mbps MIMO OFDM receiver on one ADRES processor, implemented in 90 nm technology.
3x3 (4 symbols)
4x4 (4 symbols)
6x6 (4 symbols)
8x8 (4 symbols)
3x3 (1 symbol)
4x4 (1 symbol)
6x6 (1 symbol)
8x8 (1 symbol)
Energy per symbol [J]
1.E-09 8.E-10 6.E-10 4.E-10 2.E-10 0.E+00 0.E+00
4.E-06
Symbol processing time [s]
Fig. 4.18 ADRES architecture exploration plotted in a “energy–absolute time” space
8.E-06
4.4 Processors for SDR-Baseband: Working Horses in a Race for Speed and Power
85
L1 scratch pad
Memory access arbitration
AHB Slave IF
AHB2 if
debug
debug if
CORE
rst ext_stall, resume
Config mem Special regs I$
exception IMEM if
Fig. 4.19 Processor top level architecture
The top level of the resulting optimized architecture is depicted in Fig. 4.19. The main CPU in this processor is a three-issue VLIW. The processor has an asynchronous reset, a single external system clock and a half-speed (AMBA) bus clock. Instruction and data flow are separated (Harvard architecture). A direct-mapped instruction cache is implemented for the VLIW CPU with a dedicated 128-bit wide instruction memory interface. The level-1 data memory contains 4 K × 32-bit memory per bank. These banks can also be accessed externally through an AMBA2compatible slave bus interface (which connects to the memory interface just like an additional load/store unit in the processor). The CGA configuration memories (128 contexts) and special registers are also mapped to the AMBA bus interface via a 32-bit internal bus. This way, the CGA configuration memories and the level-1 scratch-pad data memories can be accessed via DMA transfers.
4.4.2.3 Mission Accomplished: Design Results The architecture described above is implemented to reach a 400 MHz clock rate in worst-case conditions when implemented in 90 nm technology. Hence it delivers up to 16 units ×4 way SIMD ×400 MHz = 25.6 GOPS (16-bit) as foreseen to be sufficient to implement 2 × 220 MHz MIMO-OFDM at 100 Mbps +. To achieve such a clock frequency a general purpose option process has been selected. Although it is leakier, this process has a better power-delay product than the low power process usually considered for embedded applications. Leakage in operation mode can be tackled with a multi-VT design and, in standby, with third-party substrate biased standard cell library and memory macros.
86
4 SDR Baseband Platforms
DMEM 24%
Other 41%
FU/RF 33%
CPU 35% CMEM 8%
Fig. 4.20 Prototype processor area breakdown
The RTL descriptions of the functional units and multi-ported register files have been written in such a way that automated fine-grained clock gating was enabled during synthesis. 95% of the flip-flops turn out to be clock-gated with the appropriate activation signal. Furthermore, operand isolation was manually implemented in the functional units to avoid bit toggling in unused operators. Finally, scan test support and memory BIST logic were inserted. The resulting netlist was used as input for physical design with Cadence SOC EncounterTM. Standard cell placement was then optimized, followed by clock tree synthesis and final place & route. After parasitic extraction from the resulting layout, timing was checked with Synopsys PrimeTimeTM and power was estimated with Synopsys PrimePowerTM. The final layout achieves a timing of <2.5 ns in worst case conditions, which enables the operation of the processor at 400 MHz. The critical path is located in the execution stage of the functional units implementing the pipelined multiplier. The prototype die area reaches 5.79 mm2 including level-1 data, instruction and configuration memories. A more detailed area breakdown is given in Fig. 4.20. The accelerator occupies 41% of the area (2.37 mm2 ); 33% being dedicated to FU logic and register files.
4.4.2.4 Mission Accomplished: Performance Results The processor was dimensioned to enable the execution of next generation broadband cellular communication and wireless LAN standards. To evaluate its performance and power consumption, a set of benchmarks was selected corresponding to transmitter and receiver baseband processing in IEEE 802.11n and 3GPP-LTE. For each benchmark, the table below presents the portion of the execution time spent in accelerated mode as well as the Instruction per Cycle (IPC) ratio obtained in that mode.
4.4 Processors for SDR-Baseband: Working Horses in a Race for Speed and Power
87
The average IPC over the different benchmarks is 10.18. This number includes regular and SIMD operations, all counted as one operation. On average about 45% of the operations are SIMD operations. The resulting utilization of the accelerator’s FUs is 10.18/16 = 63.65%. This figure is particularly high when considering that it is obtained with compiled ANSI C-code. Furthermore, the prototype power consumption was estimated based on gatelevel simulation. The netlist resulting from the physical design, back-annotated with capacitance and parasitics information extracted from the final layout was simulated with Mentor ModelsimTM. From such simulation, accurate gate level activity profiles were generated. Synopsys PrimePowerTM was used to evaluate the CPU and accelerator power consumption based on the activity profile and the layout information. Results are summarized in Table 4.3. Peak CPU and CGA powers are given for the typical design corner (V = 1 V, nominal process, T = 25◦ C). Leakage is extrapolated to typical leakage corner (V = 1 V, nominal process, T = 65◦ C). The data memory hierarchy power consumption is added in each case. Table 4.4 presents the average estimated power consumption for executing the different benchmarks in real time. Figure 4.21 presents a further breakdown of the accelerator active power. A major fraction of 37% goes to the interconnect sub-system which include buffers, multiplexers and pipeline registers between the FUs. The FUs themselves, the configuration memories (CMEM) and the data memory (DMEM) consume 29%, 14% and 11% respectively. The shared and distributed register files consume 7% and 2%. In conclusion, the exploration of a hybrid CGA-SIMD ADRES processor has lead to a suitable architecture and implementation. The prototype design has shown to enable the real-time execution of several transmitter and receiver benchmarks out of next generation broadband cellular and wireless LAN communication standards with an average power consumption of 250 mW.
Table 4.3 Prototype processor power consumption
CPU + DMEM CGA + DMEM
Active (typical)
Leakage (typical)
75 mW 310 mW
12.5 mW 12.5 mW
Leakage (T = 65◦ C) 25 mW 25 mW
Table 4.4 Average power consumption of ADRES processor for different benchmarks Shorthand 11n 64QAM TX 11n 64QAM RX 11g 64QAM TX 11g 64QAM RX LTE TX
Average power 250 mW 232 mW 225 mW 232 mW 333 mW
88
4 SDR Baseband Platforms
FU 29% interconnection 37%
Shared register file 7%
DMEM 11%
CMEM 14%
Distributed register file 2%
Fig. 4.21 Power breakdown in CGA mode
4.5 Outer Modem Engine: Going with the Flexibility Stream 4.5.1 Problems with Dedicated Solutions Arising As they deliver decoding performance very close to the Shannon limit, advanced forward error correction schemes such as turbo-codes and LDPC codes are getting more and more popular both in mobile communication (3GPP UMTS/HSxPA/LTE) and connectivity (IEEE 802.11n, IEEE 802.16[e]) systems. However, because of their high complexity, their implementations take a significant share of the area and power budgets dedicated to baseband processing. Therefore, so far, advanced FEC decoders have been exclusively implemented as dedicated hardware blocks, focusing on minimum power consumption and area, sacrificing flexibility. Recently, the flexibility requirements for FEC have increased in wireless standards (see Fig. 4.22). In combination with the growing demand in multi-mode, multi-standard chipsets for mobile communication, the dedicated block approach would lead to a multiplication of specific accelerators, which is very expensive in silicon area. As an example, a mobile baseband platform that would support UMTS/HSxPA/LTE next to advanced Wifi (IEEE 802.11n) and (Mobile) WiMax (IEEE 802.16[e]) with their advanced FEC options would require up to four advanced FEC accelerators: one turbo-decoder for 3GPP data bearers, another turbo-decoder for WiMax, a LDPC decoder for Wifi and another LDPC decoder for WiMax. The holly grail would be a unified accelerator architecture that supports all these standards, sharing a maximum of their memory and datapath while still allowing multiple concurrent decoding treads.
4.5 Outer Modem Engine: Going with the Flexibility Stream
Need in flexibility
• In IEEE802.11n:
Modulation Demodulation (Inner Modem) Forward Error Correction (Outer Modem)
89 • In IEEE802.16e:
– K=7 Conv.
– K=7 Conv + RS
– Layered LDPC
– Block Turbo
• In LTE: – Conv. Turbo
– Conv. Turbo – Layered LDPC • Future: – N-binary CTC/ LDPC
Synchronization
Forward Error Correction (Outer Modem)
FE steering FE steering detection Signal Signal detection
Need in energy efficiency Fig. 4.22 Increasing flexibility requirements on FEC in wireless standards
4.5.2 Flexible Solutions in Sight Given the increased flexibility requirements (arising from functional needs as well as costs), a Flexible FEC Engine has become a desirable core. The nature of FEC coding, typically depicts a magnitude higher computation load than the inner modem processing, and faces specific bottlenecks in memory accesses. Consequently, the processors optimized for inner modem functionality (as presented in Section 4.4 are not fit for this job. Instead an application specific instruction set processor (ASIP) implementation approach, can guarantee maximum flexibility at minimum area, with limited power consumption overhead. Moreover, the latter is easily compensated by the leakage reduction, consecutive to the area reduction. An exemplary solution [29] targets an application-specific instruction programmable architecture addressing in a unified way the emerging turbo- and LPDC coding requirements of 3GPP-LTE, IEEE802.11n, IEEE802.16(e) and DVB-S2/T2. This solutions is further explained below, focusing on the architecture and implementation results. For an introduction to Turbo and LDPC coding algorithms considered in this implementation, we refer to [29].
4.5.2.1 Concept The proposed unified ASIP architecture combines a wide SIMD datapath (implementing dedicated instruction to support both LDPC and turbo decoding) with a distributed memory architecture. The background memory is distributed with as many single-ported banks as SIMD slots. These are connected to the foreground memory
90
4 SDR Baseband Platforms
with a bi-directional shuffler. The key idea is to hide the scrambling needed by the LDPC and interleaving required by the turbo-decoding using reconfigurable address generation units (rAGU). The parallel log-MAP or LDPC check-node processing are implementing the decoding algorithms. The corresponding programs, written in assembly, assume a virtual address space where collisions never happen. This virtual address space is mapped to physical background memory addresses and read-in/write-out crossbar control using the rAGUs. The rAGUs are configured with look-up table (LUT). 4.5.2.2 Decoder Architecture The high degree of parallelism possible in the turbo and LDPC decoding algorithms paves the way for a wide SIMD architecture. The decoder architecture consisting out of K processors each having an N-slot SIMD processor is depicted in Fig. 4.23. Besides the SIMD pipeline the processor instances also contain a control unit with allocated program memory and register file, vector register file and vector cache. The SIMD processing is using virtual linear addressing, made possible through the shuffler and rotation engine blocks, which can perform permutations on the data stored within the background memory banks. Selection of the words within the memory banks and shuffle control is made possible through the rAGU. In case of LDPC requiring a high parallelization (like in DVB), a rotation engine connecting all processor cores, allowing a cyclic rotation of data from all background memory banks towards the aligned scratchpads, is present. In all other cases the shuffler is used to permute the data within each N-slot processor. control interface Input/ output interface Top-level interconnect AGU1
AGU2
MxN banked MxNbanked Background Background memory memory
AGU1
AGU2
MxN banked MxNbanked Background Background memory memory
AGU1
AGU2
MxN banked MxNbanked Background Background memory memory
shuffler
shuffler
shuffler
Foregroundmem mem Foreground
Foregroundme memm Foreground
Foregroundmem mem Foreground
CU
N-WaySIMD SIMD N-Way pipeline pipeline
CU
N-WaySIMD SIMD N-Way pipeline pipeline
CU
N-WaySIMD SIMD N-Way pipeline pipeline
PMEM
LIFO
PMEM
LIFO
PMEM
LIFO
K processors Fig. 4.23 Decoder architecture of a K-processor N-slot SIMD decoding engine
4.5 Outer Modem Engine: Going with the Flexibility Stream
91
4.5.2.3 Realizations: Performance and Design Cost From the decoder architecture detailed in Section 4.5.2.2, specific instances can be derived and realized. In the next subsections the trade-off between throughput and area is made, with varying number of slots N and number of processors K. In order to optimally support the parallelism present within the DVB LDPC mode, a four processor with 96-slot SIMD is required. For all other decoding modes, where the required parallelism is smaller or equal to the number of SIMD slots N, the processors could be used to handle independent decoding processes. Due to the SIMD implementation, the decoding loops need to be identical, mapping simultaneous LDPC and turbo decoding on single processor instance impossible. As a case study, the clock frequency is set to 333 MHz and the implementation is derived for a 45 nm CMOS low K, standard Vt technology. A behavior SystemC model mimicking the described decoder engine is used as basis for verification of the algorithmic decoding process as well as determining the throughput. Figures 4.24 and 4.25 show the throughput evolution in case of LDPC and turbo respectively, for different number of slots and processors. The number of iterations in case of LDPC and turbo is chosen equal to 25 and 6 respectively, as average operation mode of the error decoding schemes. We can observe that the maximum throughput scales linear with increasing number of processors and SIMD slots. A FEC processor is a heavily memory dominated design. As explained in Section 4.5.2.2, there are seven types of memory blocks: the input and output FIFOs, the background memory, the LIFO, the register file, the rAGU LUT and the program memory. The sizes of these memories depend on the exact instance of the FEC processor chosen. The different memories of the design were generated using TSMC’s
400 350 300 250 LDPC Throughput 200 (Mbps) 150 100 K=4 K=3
50
K=2
0 24
K=1
48
72 #slots (N)
96
Fig. 4.24 LDPC throughput vs. number of slots and processors
#processors (K)
92
4 SDR Baseband Platforms
600 500 400 CTC Throughput 300 (Mbps) 200 K=4 K=3 K=2 #processors (K) K=1
100 0 24
48
72 #slots (N)
96
Fig. 4.25 Convolutional Turbo Code (CTC) throughput vs. number of slots and number of processors
1.8 1.6 1.4 1.2 area [sqmm]
1 0.8 0.6 0.4
K=4 K=3 #processors K=2
0.2 0 24
K=1
48
72 #slots (N)
96
Fig. 4.26 Area vs. number of slots and processors
memory compiler ver. 2007.11 using 45 nm CMOS low K, standard Vt technology. The dominant logic part (crossbar) was synthesized with a target clock frequency of 333 MHz in worst case corner. Figure 4.26 shows the area evolution for different number of processor and SIMD slots. Reducing the number of slots of the processor, the width the rAGU LUT, background memory, LIFO memories reduces. As the memories are rather large, changing the width almost has a linear effect on their area. Therefore we can see that increasing the number of slots has a linear impact
4.6 Conclusions
93
on the area requirement. Since increasing the number of processors implies just an extra instance of the processor, we also see a linear trend when we increase the number of processors. The resulting engine also achieves nice energy consumption results, based on first estimates [29]. At the time of writing of this book, a prototype chip design is in progress.
4.6 Conclusions 4.6.1 SDR Baseband Platforms: Going Mobile Today The motives to make baseband platforms flexible are becoming more distinct every day. Fortunately, technological solutions are becoming available to realize SDR baseband platforms, even suitable to make these platforms go mobile. In this chapter, a ‘divide and conquer’ approach was advocated and presented, resulting in a heterogeneous MPSoC solution combining flexibility with low power. For the different cores on the platform, the nature and the complexity of the required processing are quite different. Pursuing algorithm-architecture co-design, optimized engines for DFE, inner modem, and FEC are achieved. A full prototype design was realized, proving the validity of the concepts. For the DFE for burst-based communication, signal detection functions have high duty cycle and hence need ultra low power implementation. Besides, programmability must be preserved to support multiple modes implementation. A Digital Front end (DFE) featuring an ultra low-power pre-synchronization ASIP was presented in Section 4.3. It was designed targeting broadband burst-based wireless standards, and achieves power in idle time and for synchronization in the same order of magnitude as dedicated solutions. For the inner modem functionality, we presented the design of a hybrid CGA-SIMD processor dedicated to software-defined radio baseband processing in Section 4.4. The accelerator is shown to enable the real-time execution of several transmitter and receiver benchmarks out of next generation broadband cellular and wireless LAN communication standards with an average power consumption of 250 mW. For the FEC, a flexible engine is proposed in Section 4.5. The scalable architecture covers advanced turbo and LDPC codes, and achieves very nice area results.
4.6.2 The Future: Next Generations Desired The race for more speed and capacity in wireless communication systems continues. Higher speeds, up to Gbit/s, will need to be accommodated even in mobile devices. Moreover, in order to cope with the aggravating spectrum scarcity, cognitive behavior will need to be enabled (see Chapter 7).
94
4 SDR Baseband Platforms
For SDR baseband platforms, evolutions in wireless applications will require a next generation, where specifically the following features should be added and/or upgraded: • The evolution of the Digital Front-End (DFE) tile from a sample “relay” with limited programmability/intelligence for valid data detection toward a full spectrum sensing engine, enabling cognitive radio. • The interconnect on the platform will need to provide a higher bandwidth and flexibility. A Network on Chip (NoC) seems the most appropriate solutions for the foreseeable future. • Multiple asynchronous parallel streams will need to be accommodated for on the platform. • Much higher computing capacity than state-of-the-art number crunching fabrics will be needed (at least one order of magnitude). Next to evolving to higher clock frequencies, more processors and bigger processors on the platform will need to be provided. An important challenge will in mapping the software on multi-core platforms (see Chapter 5 where we introduce parallel processing for multi-processor and multi-threading solutions). • The core-level energy efficiency will need to be improved drastically (a factor of 5 or more), in order to keep on offering ‘more for less’. Technology scaling alone will definitely not be sufficient to achieve the required enhancement. Even if the wish list for upgrades for SDR baseband platforms bring important technological challenges, none of them seems impossible today. The combined creative inspiration on different levels (technology, design, architecture, and system) opens the opportunity to realize an even more attractive generation of Green SDR baseband platforms in and for the future!
References 1. L. Van der Perre, B. Bougard, A. Bourdoux, H. Cappelle, V. Derudder, F. Horlin, M. Glassee and P. Vanbekbergen, Broadband WLANs: Setting the Limits for SDR Platforms, WWRF 15, WG5, Paris, France, Dec. 2005. 2. J. Glossner et al., A Software Defined Communications Baseband Design, IEEE Communication Magazine, Vol. 41, No. 1, pp. 120–128, Jan. 2004. 3. C. Van Berkel et al., Vector Processing as an Enabler for Software Defined Radio in Handsets from 3G+ WLAN Onwards, Proceedings of the Software Defined Radio Technical Conference, Vol. B, pp. 125–130, Nov. 2004. 4. Y. Lin et al., SODA: A Low-Power Architecture for Software Radio, Proceedings of the 33rd International Symposium on Computer Architecture (ISCA’06) Boston, MA, pp. 89–100, 2006. 5. J. Glossner et al., The Sandbridge Sandblaster Communication Processor, Proceedings of the 3rd Workshop on Application Specific Processors, Stockholm, Sweden, pp. 53–58, Sept. 2004. 6. R. Baine et al., Software Defined Baseband Processing for 3G Basestation, Proceedings of the 4th International Conference on 3G Mobile Communication Technologies, London, UK, June 2003.
References
95
7. B. Mei, S. Vernalde, D. Verkest, H. De Man and R. Lauwereins, ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse Grained Reconfigurable Matrix, Proceedings of FPL, Lisbon, Portugal, 2003. 8. B. Mei, S. Vernalde, D. Verkest, H. De Man and R. Lauwereins, DRESC: A Retargetable Compiler for Coarse-Grained Reconfigurable Architectures, Proceedings of Field Programmable Technology, Hong Kong, China, pp. 166–174, 2002. 9. A. Parssinen, Keynote Talk: System Design for Multi-standard Radios, Proceedings of ISSCC, San Francisco, CA, Feb. 2006. 10. A. Lambrechts, P. Raghavan, M. Jayapala, F. Catthoor and D. Verkest, Energy-Aware Interconnect Exploration of Coarse-Grained Reconfigurable Processors, Proceedings of Workshop on Application Specific Processors, NY, USA, Sept. 2005. 11. P. Raghavan, A. Lambrechts, M. Jayapala, F. Catthoor and D. Verkest, Empirical Power Model for Register Files, Proceedings of Workshop on Media and Streaming Processors, Barcelona, Spain, Nov. 2005. 12. R.B. Rau, Iterative Modulo Scheduling, HP Lab, Technical Report: HPL-94-115, 1995. 13. A. Lodi et al., XiSystem: A XiRISC SoC with reconfigurable IO Module, IEEE Journal of Solid State Circuit, Vol. 41, No. 1, pp. 85–96, Jan. 2006. 14. I. Chen et al., Overview of Intel’s Reconfigurable Communication Architecture, Proceedings of the 3rd Workshop on Application Specific Processors, pp. 95–102, Stockholm, Sweden, Sept. 2004. 15. http://www.qstech.com. 16. www.coware.com. 17. N. Bagherzadeh et al., MorphoSys: A Parallel Reconfigurable System, Proceedings of EuroPar 99, France, Sept. 1999. 18. http://www.arm.com. 19. F.K. Jondral, Software-Defined Radio – Basics and Evolution to Cognitive Radio, EURASIP Journal on Wireless Communications and Networking, Vol. 5, No. 3, pp. 275–283, 2005. 20. M. Jayapala et al., Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, to Appear in IEEE Transaction on Computers. 21. D. Novo et al., Mapping a Multiple Antenna SDM-OFDM Receiver on the ADRES CoarseGrained Reconfigurable Processor, Proceedings of the IEEE Workshop on Signal Processing Systems, Athens, Nov. 2005. 22. B. Bougard, et al., A Scalable Programmable Baseband Platform for Energy-Efficient Reactive Software-Defined Radio, Proceedings of Crowncom, Myconas, Greece, 2006. 23. T. Schuster, D. Novo, B. Bougard, V. Derudder, A. Hoffmann and L. Van der Perre, SubwordParallel VLIW Architecture Exploration for Multimode Software Defined Radio, IEEE Workshop on Signal Processing Systems, Banff, CA, Oct. 2006. 24. D. Novo, B. Bougard, P. Raghavan, T. Schuster, H.-S. Kim, H. Yang and L. Van der Perre, Energy-Performance Exploration of a CGA-Based SDR Processor, SDR Forum, Orlando, FL, 2006. 25. M. Dillinger, K. Madani and N. Alonistioti, Software Defined Radio: Architectures, Systems and Functions, Wiley, Chichester, 2003. 26. G. Desoli and E. Filippi, An Outlook on the Evolution of Mobile Terminals, CAS Magazine, second quarter 2006. 27. L. Van der Perre et al., Architectures and Circuits for Software Defined Radios: Scaling and Scalability for Low Cost and Low Energy, ISSCC 2007, San Francisco, CA, Feb. 2007. 28. B. Bougard et al., A Coarse-Grained Array Accelerator for Software Defined Radio Baseband Processing, IEEE micro July/August 2008, Vol. 28, No. 4, pp. 41–50. 29. F. Naessens et al., A Unified Instruction Set Programmable Architecture for Multi-standards Advanced Forward Error Correction, SIPS 2008, Washington, DC, Oct. 2008. 30. B. Bougard et al., A Low Power Signal Detection and Pre-synchronization Engine for EnergyAware Software Defined Radio, SDR forum 2006, Orlando, FL, Nov. 2006. 31. T. Schuster et al., Design of a Low Power Pre-synchronization ASIP for Multimode SDR Terminals, SAMOS VI, Samos, Greece, July 2006. 32. Microchip PIC16F84A datasheet, http://www.microchip.com. 33. Paul Simon, Proof, Rhythm of the Saints, 1990.
Chapter 5
Software: Fuel for Green Radios The Blessing and the Curse
5.1 The Blessing and the Curse Hardware (design) reuse has been advocated as one of the main advantages to go for SDRs. Indeed, this evolution not only creates flexibility, yet also enables major cost savings. The possibility to implement, update, and upgrade communication modes in software, is a clear blessing. However, it is at the same time a curse, as it is a complex and critical challenge to design the software. ‘Efficiency’ is there the key target! Indeed, the SW efficiency largely impacts the energy SDRs will drain from the wireless terminal’s battery [1]. Moreover, SW design efficiency has been shown to become more and more the bottleneck in the total design effort and time for flexible radios [13]. Therefore, a systematic design flow is proposed as the strategic plan. The results of the implementation on wireless access schemes are presented. In the future, new challenges in SW design for SDRs pop up, yet opportunities to improve SW and design efficiency also come into sight. This chapter is further organized as follows. In Section 5.2, a representative SW build-up is proposed, and an approach to ensure both design efficiency and consistency is presented. The platform level and baseband SW flows and results are described in Sections 5.3 and 5.4 respectively. For MAC layer SW design on SDR platforms, a framework for development and validation is proposed in Section 5.5. For each of the three classes (levels) of SW described in these sections, first a ‘strategic plan’ is introduced, presenting a method (flow/framework). Consequently, the results of applying the proposed approach are given, under the title ‘meeting the design goals’. For the latter, the IMEC platforms and designs (HW and SW) are used as case studies. Future challenges and opportunities in SW design for next generation wireless standards on next generation SDR platforms are introduced in Section 5.6. Finally in Section 5.7, the main conclusions of this chapter are summarized.
L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
97
98
5 Software: Fuel for Green Radios
5.2 Structured SW Design: Going for Network and Platform Compatibility Due to the evolution from hardware to software processing and control, the software design of radio systems moves to a new level of complexity. On the one hand, the hardware radio has been replaced by several general purpose or application specific processors running optimized kernels that fulfill one task in the data processing chain and can be changed at runtime to support different wireless standards. On the other hand, the system software that used to delegate the data between the host system and the hardware radio adding Medium Access Control (MAC) has moved to a lower level. Figure 5.1 illustrates the different components and the build-up of the SW in an SDR system. For the overall SW design for SDRs, we identify two important targets: 1. Compatibility with and efficient operation on the hardware is of crucial importance. Indeed, the SW should eventually end up on the platform, and reliability (functional correctness) and performance (e.g. meeting real-time and latency requirements) targets should both be achieved. 2. Compatibility with and efficient operation in a (standardized) protocol needs to be ensured. To this end, proper behavior of terminals, and their interaction, on the (shared) ether, should be guaranteed. While the first target asks for a platform-centric SW development methodology, the second objective pleads for a network-centric approach. Figure 5.2 presents an
Other protocol functions & management
Embedded SW on SDR Platform
Can include common SW components *
SW on host processor
Platform control & time critical MAC Hardware Abstraction Layer Platform and processor dependent SW
Processor 1
Processor n
‘Operating environment’ ‘firmware’ *
Platform component i
SDR Hardware
* Eventually downloadable Fig. 5.1 SDR SW components and build-up
99
SYSTEM SOFTWARE
SIMULATION PLATFORM INDEPENDENT CODE
5.2 Structured SW Design: Going for Network and Platform Compatibility TIME CRITICAL MAC SOFTWARE
PHY SYSTEM SOFTWARE
SIMULATION PLATFORM DEPENDENT CODE
HAL HAL IMPLEMENTATION USING SIMULATOR FRAMEWORK
HAL IMPLEMENTATION USING BEAGLE
TEST CODE + BASIC MAC
BEAGLE BEAR PROGRAMMING LAYER (BEAGLE) IMPLEMENTATION
SIMULATION FRAMEWORK
SYSTEM/NETWORK-CENTRIC SIMULATION
TLM
VHDL
PLATFORM/COMPONENT-CENTRIC SIMULATION
Fig. 5.2 System software layers and their relation with the different simulation levels
Fig. 5.3 Scheme of considered SDR-baseband platform case
integrated structure, enabling at the same time platform and network compatible SW. Thanks to the definition and development of appropriate Hardware Abstraction Layers (HALs) and programming interfaces, the SW can be simulated in different environments, going for both speed and correctness. Both design efficiency and consistency, are thus realized. In the coming sections, the platform-centric SW development environment, flow, and results are presented first. Then, the network-centric SW development and validation framework are explained. As a case study, we consider IMEC’s SDRbaseband platform [2], which was presented in Chapter 4, and for reference is schematically shown in Fig. 5.3.
100
5 Software: Fuel for Green Radios
5.3 Platform-Level SW: The Control Room for the SDR 5.3.1 The Strategic Plan: Design Flow Designing an SDR MPSoC under stringent time-to-market (cost), energy and real-time processing constraints requires advanced Electronic System-Level (ESL) design methodologies. ESL is a new design approach raising the level of abstraction from RTL to system level. The aim is to introduce HW/SW verification and architectural exploration steps early in the design flow. An exemplary HW/SW co-design flow for SDR baseband platform design, which has shown to be successful in the case study performed at IMEC, is shown below (Fig. 5.4). We built a virtual platform based on Coware ConvergenSC toolset [9], further referred as CSC. CSC essentially enables quickly generating SystemC-based platform simulator with components at different level of abstraction.
5.3.2 Meeting the Design Goals: Latency Requirements The HW-SW co-design environment can be used to develop platform-level SW to meet the actual performance requirements. Importantly, latency specifications of wireless standards are stringent, and hard to meet on MPSoC-platforms. The HW-SW co-design approach allows designing and testing the SDR solutions for meeting the specification structurally, so that performance can be met without
Functional MATLAB
IP
requirements Optimized MATLAB
Ansi-C DSP model
Platform definition Platform info
TLM model C-code on TLM
Ansi-C optimized kernels
RTL model
chip
Fig. 5.4 HW/SW co-design for SDR baseband platform
5.4 Baseband Processor SW: The Working Horse for the SDR
101
a
AIR
13.9us us 13.9
Rx burst burst Rx
Acknowledge Acknowledge
DFE DMA
Preamble Preamble
Data Data
OMOD BB
Preamble Preamble 0 us
16 us
Sig
24 24us us
Dat time
b Fig. 5.5 Latency requirement validated on VHDL a for packet processing SW implementation b
including unnecessary margins and over-design. For the case of a WLAN system, the IEEE802.11a standard prescribes that an acknowledgment needs to be sent within a latency budget of 16 μs after reception of a burst. After optimizing the SW to this end, the test results as shown in Fig. 5.5 indeed confirm this requirement is met.
5.4 Baseband Processor SW: The Working Horse for the SDR 5.4.1 The Strategic Plan: Design Flow For the baseband processor SW, short deployment time yet excellent performance/power is achieved, thanks to the innovative C-compiler which comes with the ADRES processor [8]. A systematic, partly automated, Matlab-to-C design flow was set-up. Several steps in Software Defined Radio (SDR)-baseband design are followed to obtain for a wireless application (e.g. DVB-H receiver [4], 3GPP LTE receiver, WiMAX receiver, etc.) C code that can run on the ADRES architecture if starting from the Matlab code of this application. Actually, two main phases can be identified in this design flow: 1. From functionally correct Matlab to ‘Agility C’ code, which is quantized 2. From Agility C code to wireless application package Agility is a commercial tool [6] used to help quantize the Matlab code and produce a first version of the C code.
102 Fig. 5.6 From functional MATLAB to Agility C
5 Software: Fuel for Green Radios
Functional correct MATLAB code
Isolation of MATLAB code to be mapped
Optimization of isolated MATLAB code
Quantization of isolated MATLAB code
Conversion to C using Agility tools
Agility C code
The starting point of the first phase is a Matlab code written for a specific wireless application that correctly runs in Matlab. The optimization and transformation to Agility C code consists of four steps, as shown below (Fig. 5.6). A crucial step is the optimization of the Matlab code. This includes operations such as: reducing the latency of the processing (e.g. by pipelining); reducing the computational load (by reducing the number of operations like additions, multiplications, etc.); reducing the memory requirements. Experience has learned that applying these optimizations at the Matlab level in an early design phase, greatly improve the efficiency of the SW as well as of the design time. Once the code has been optimized, it needs to be quantized. Before the quantization process itself, a (Matlab) test bench (TB) needs to be created and the Matlab code has to be transformed taking the elementary ADRES operations into account. The C code output by Agility for the target wireless application is far from being optimal if it has subsequently to be mapped on the ADRES architecture. It needs to be optimized before obtaining in the end the wireless application package, featuring low power consumption. This optimization is a manual (i.e. not automatically carried out by a tool) process consisting of the three steps shown in Fig. 5.7. The identification of kernels to be optimized is based on analyzing and profiling the full code. Building up a library of kernels, allows fastening the design process for access schemes considered in the future.
5.4 Baseband Processor SW: The Working Horse for the SDR Fig. 5.7 From Agility C to wireless application package
103
Agility C code
Designofofkernels kernels Design
Integration kernels Integration ofof kernels intowireless wirelesslibrary library into
Wireless Wireless library library
Designofofwireless wireless Design applicationpackage package application
5.4.2 Meeting the Design Goals: Real-Time Requirements The baseband processor mapping flow, as introduced in Section 5.4.1, was applied successfully to implement several wireless standards in real-time. The results achieved on a ‘deframing’ function, illustrate the huge cycle and instruction count (and thus power) savings which are achieved by the systematic optimization of the SW (Fig. 5.8). The applied transformations included: Functional simplification (removing conditional statements and associated array), loop flattening, loop merging, indexing optimization, data access and pointers optimization, code rearranging. The original design (code) featured the following characteristics: • • • •
Optimized Matlab function: three lines of code Reference implementation in C: eight lines of code Total number of cycles when mapped on ADRES: 72,068 Total number of instructions when mapped on ADRES: 345,543
Finally, the following results were achieved: • Total number of cycles: 5,015 (≈14 times improvement) • Total number of instructions: 33,387 (≈10 times improvement) • 56 KB smaller memory footprint The fitness of the IMEC SDR solutions was thus shown for three different wireless standards: Wireless LAN (802.11a, g, and n), WIMAX (802.16e), and 3GPP-LTE transmission. This involved mapping the inner modem functionality on the optimized ADRES core. Thanks to the dedicated instructions, including efficient SIMD operations, and thorough optimizations, even the 20 MHz WLAN MIMO reception has been fitted on one core, running at 400 MHz, with an average power consumption of 250 mW. For transmitting and single antenna modes, the core typically can run at less then half of the speed, thus enabling significant power savings.
104
5 Software: Fuel for Green Radios 80000 70000
Cycle Count
60000 50000 40000 30000 20000 10000 0 1
2
3
4
5
6
7
8
9
10
8
9
10
Optimization Stage
a 400000
Instruction Count
350000 300000 250000 200000 150000 100000 50000 0 1
2
3
4
5
6
7
Optimization Stage
b Fig. 5.8 Savings achieved in cycle a and instruction b counts, by subsequent SW optimizations stages
Importantly, the performance degradation due to the quantization and mapping on the platform is negligible. This is illustrated in Fig. 5.9. The above Figure shows that the quantization of the code brings some performance degradation in the higher Signal-to-Noise Ratio (SNR) region. However, the mapping on ADRES itself does not bring any implementation loss.
5.5 System Level SW: Providing SDR Terminals with Social Skills
105
BER
100
matlab floating point matlab quantized code C code mapped on ADRES 10
−1
10−2
10−3
10−4 0
5
10
15
20
25
30
35
Fig. 5.9 Bit error rate (BER) performance for floating point and quantized Matlab code, and C code mapped on baseband (ADRES) engine
5.5 System Level SW: Providing SDR Terminals with Social Skills 5.5.1 The Strategic Plan: A Simulation Framework for Network Centric SW Development and Validation Simulating the platform on TLM and VHDL models is very effective to check the platform in an instruction accurate or even gate level accurate way. All data processing algorithms can be verified up to bit level. However, there are several drawbacks when moving to the design and simulation of system software. First, the simulation focuses on one single SDR terminal. So there are no means to simulate interaction between several network nodes. Second, the simulation speed is far too slow to simulate MAC and PHY system software. This mainly comes from the fact that the data processing algorithms are completely simulated, i.e. data is really modulated, demodulated, interleaved, etc. From a platform simulation point of view, this is needed; however this is an unnecessary overhead for simulating the system software. These two drawbacks lead to the development of a new platform simulator. The level of abstraction for this simulator is illustrated on the left side of Fig. 5.2. The idea is to mimic the behavior of the HAL API towards the system software,
106
5 Software: Fuel for Green Radios
meaning that from the system software point of view, the HAL should behave the same as it does when simulating the system software on the TLM or VHDL model. The simulator is network transparent. It runs on a host as a service and accepts socket based connections. Several simulated systems can then connect to this simulator (referred to as the Simulator Server) and use the three services it provides: a time model service, an ether bandwidth service and a logging service (see Fig. 5.10). 1. Time service: The first service is the time model that the simulation server provides. The server offers a uniform way to register events that take up a specific amount of time, called the time model (Fig. 5.10). When implementing the HAL for the DMA controller for example, it is known that a transfer of X amount of bytes will take ‘DT’ amount of time. So when implementing the “Dma start()” function that is part of the DMA HAL API as mentioned before, at some time an event is registered to the system simulation server that takes DT amount of time. When registering this event, the model also provides a callback function that will be called when the DT time has elapsed. After DT time, this callback function is called by the system simulator server and the DMA model signals to the higher software layer (the PHY) that the transfer was done. This way the DMA HAL API has the same behavior to the higher software as on the real platform, i.e. it takes the same amount of time to do a DMA transfer. For all components (BB, DFE, OMD, etc.) the corresponding HAL implementation makes use of this event registering to model their behavior towards the system software layers. The timings used when registering events are obtained from the platform simulations in ConvergenSC (see Section 5.3.1). This way, all the timings on the system simulated platform closely matches the timings of the “real platform”. Note finally
SYSTEM SIMULATION OF A WIRELESS TERMINAL SYSTEM SIMULATION OF A WIRELESS TERMINAL
TIME CRITICAL MAC Low level Xl MAC SOFTWARE HAL BBE Abstraction layer
DFE
DMA
PHY SYSTEM Abstraction layerSOFTWARE Abstraction layer
BBE Abstraction layer
BBE ABSTRACTION LAYER
DFE Abstraction layer
Timer Abstrac tion layer
DMA Abstraction layer
OMD Abstraction layer
Timer Abstraction layer
DFE DMA ABSTRACTION ABSTRACTION XMSF Interface (API) LAYER LAYER
SYSTEM SOFTWARE
SYSTEM SIMULATION OF A WIRELESS TERMINAL Low level Xl MAC
HAL
OMD Abstraction layer
TIMER ABSTRACTION LAYER
OMD ABSTRACTION LAYER
HAL
XMSF Interface (API)
SIMULATION SERVER API
TIME MODEL
ETHER BANDWIDTH MODEL
LOG REFLECTOR
SIMULATION SERVER
Fig. 5.10 Simulating several terminals using calls to the system simulation servers API
5.5 System Level SW: Providing SDR Terminals with Social Skills
107
that this timing API of the simulation server is used by all the simulated terminal/node instances to register their events. That way several terminals/nodes can be simulated in a synchronized way. 2. Ether bandwidth service: with the time model of the simulation server it is possible to synchronize several simulated terminals/nodes. However, as long as they do not share a medium and exchange date, the PHY/MAC system software cannot be properly simulated. This medium connection is needed in the HAL implementation of the DFE where data is normally send to the analogue front-end and send over the air. To simulate this, the simulation server offers the ‘ether bandwidth’ service. When the HAL DFE model sends data to this service, the other simulated nodes/terminals will receive this data. The service also includes features for carrier sensing, instantiating several antennas’, transmitting on several channels and collisions. 3. Logging service: When running system-level software on the platform with the CoWare ConvergenSC tools, the main visualization and debugging tool is the ARM debugger console. To facilitate development/validation, the simulation server provides a logging interface. Log messages from all simulated nodes/terminals are collected in the log reflector of the simulation server. Logging applications can then connect to the server, collect all the log messages and process them for display. The following log messages are supported: on and off and more detailed information on the status of components, events, and power consumption (see Section 5.5.2).
5.5.2 Meeting the Design Goals: The 802.11n Case 5.5.2.1 Implementing SW Based on the described framework, the first layer of system level software was developed, considering the 802.11n standard as a case study. This software controls the reception/transmission of a data packet from the antenna interface to the MAC layer and vice versa. This data path is shown in Fig. 5.11 for the transmission of a packet (for the purpose of clarity, the single antenna transmission case is shown).
5.5.2.2 Energy Profiling In the implementation of the HAL on the system simulator, the code sends log messages to a JAVA application about the energy it has consumed. The graphical logger then uses this energy information to generate plots about the power consumption of the system and its components. A typical example is shown in Fig. 5.12.
108
5 Software: Fuel for Green Radios Main Memory
Analog FE
DFE Encoder Parser
Fig. 5.12 Energy profiling example of transmitting modem
RX FIFO 3 RX FIFO 2 RX FIFO 1
Fig. 5.11 The data path for the transmission of an 802.11n data packet
RX Tile 3 RX Tile 2 RX Tile 1
BBE MEM
TX FIFO 3
IFFT
TX FIFO 2
IFFT
Interleave + Mapping
TX FIFO 1
FIFO IN
Preamble
FIFO OUT
Signal Field
Interleave + Mapping
FEC Encoding
TX Tile 2
FEC Encoding
Scrambler
TX Tile 3
CRC
TX Tile 1
Payload Data
BBE OMD
5.6 Future Challenges and Solutions
109
As the power consumption of an IMEC SDR platform running a real protocol is very dynamic, depending of the settings of the different components, this allows viewing the power for different scenarios, i.e. receiving terminal, a transmitting terminal, a busy terminal, an idle terminal etc. Moreover, the insight gained on the power consumption and the ‘ether occupation’ can help the conception of measures realizing ‘green behavior’. The simulator framework can be used to verify and validate improvements.
5.6 Future Challenges and Solutions 5.6.1 The Wireless Race for More: Trouble Ahead We may expect the demand for wireless connectivity to keep on growing in the next 5–10 years: users want to get ever faster wireless internet access, HDTV will become the standard, Gbyte memories will be pervasive asking to get connected wirelessly. Up to Gbit/s rates will need to be supported, and multiple parallel streams will need to be supported in order to offer various simultaneous services. Moreover, users will want to experience seamless connectivity in a heterogeneous network environment. Cognitive behavior of radios will be needed to satisfy the bandwidth demands, as the spectrum will get more and more crowded. The impact of this evolution of the wireless applications on the SW will be huge: both the control/platform level SW, and the baseband processing SW, will see a dramatic complexity increase. Moreover, the energy availability on mobile terminals will stay constraint, and thus mastering power consumption will become even more challenging.
5.6.2 Solutions to Boost Performance: More Parallelism in the SW Further technology scaling alone, bringing along an increase in clock frequency, will not suffice to address the challenges outlined above. In order to enable efficient and effective SW design for future wireless SDR-based terminals, new approaches will need to be followed, and new (programming) paradigms implemented: For the platform level SW, raising the abstraction level is desired. Important initiatives have started work on the definition of APIs in this context (e.g. see [12]). For the baseband DSP SW, new processor architectures with major improvements on energy efficiency (GOPS/mW) are emerging but are still not sufficient to catch the continuously increasing complexity of wireless physical layers within the shrinking energy budget. In order to support higher rate streams and more simultaneous streams, it will be essential to further exploit parallel processing capabilities. Next to data-level and instruction-level parallelism that can be exploited on
110
5 Software: Fuel for Green Radios
state-of-the art processors (see Chapter 4 and Section 5.4.2 in this chapter), also thread-level parallelism can bring the next performance enhancement. These multiple threads can be implemented in several ways: 1. On several segments of one highly parallel processor, which could flexible can switch between multi-threads and single large threads. 2. On multi-core platforms which could also support asynchronous streams running in parallel [7]. Both above approaches were investigated for a WLAN MIMO case, where different parallelization options were considered: one pursuing a ‘per antenna’ split, another one an ‘odd-even symbol’ split. As an example, a multi-threading solution running on a two-segmented ADRES [3] processor is shown in Fig. 5.13. Progressively in time, first two threads run in parallel performing digital front-end functionality and FFT, each running on a 4 × 4 FU (Functional Unit) segment. Consequently, they ‘join’ to perform the spatial equalization which needs to combine information coming from the two antennas. This equalizer is the most complex block in the receiver, and can utilize the full 4 × 8 quite well. Finally, they fork again to perform the demapping for the two resulting streams in parallel.
Fork
Join
Time
Fork
a Global DRF
Global DRF
FU
FU
FU
FU
FU
FU
FU
FU
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
FU RF
b Fig. 5.13 WLAN MIMO case: ‘per antenna’ split solution a running consecutively two parallel threads on segments, and one ‘master thread’ employing the full processor b
5.6 Future Challenges and Solutions
111
Applying thread-level parallelism, indeed still higher rate streams that can not fit on one processor, can be implemented on SDRs. These solutions thus open the door for further performance improvements.
5.6.3 Solutions to Save Power: Architecture-Aware Scalable SW The continuously increasing complexity of wireless standards is not recompensed by an equal increase in battery capacity (see Chapter 2). To enable SDR in size, weight and energy constrained devices, innovation is also needed at the software side to achieve savings in power consumption. Two main concepts have shown a great potential in this context, adapting to and exploiting the nature of the SDR platform. Indeed, in a traditional implementation, functionality is refined towards a direct hardware implementation. If the implementation target is a reconfigurable platform, featuring processors for baseband processing, one should take its features into account in the SW design in order to achieve power efficiency [5]. Two main concepts have shown great potential in this context: 1. A thorough architecture-aware algorithm refinement and implementation approach for the baseband signal processing functions, which account for most of the SDR computational complexity. As an example, selecting and designing algorithms (for example for MIMO detection) that can be parallelized extensively, will lead to much more efficient implementations on highly parallel processors. In specific cases, more than 90% efficiency improvement is achieved compared to the direct porting of the traditional hardware-minded implementation [11]. 2. Energy-performance scalable SW implementations of modulation and demodulation. By continuously adapting the amount of processing to the effective user and/or environment requirements, significant gains on average power can be achieved. The ‘enable and exploit scalability’ concept which is applied, indeed also applies to the SW implementation. As an example (see Fig. 5.14), the data type refinement (defining the word lengths of variables) may be performed to trade-off power versus performance. Such dynamic solutions can achieve an almost double energy efficiency in typical SNR regions, compared to fixed implementations [10]. The key idea of architecture-aware scalable SW implementation is to leverage on the fitness of programmable architectures to increase the system ability to continuously adapt to the varying user and environment requirements. This eventually reduces the average computation load and thereby the power.
112
5 Software: Fuel for Green Radios
Fig. 5.14 Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance
5.7 Conclusions One of the key challenges in implementing wireless access schemes on SDRs, is the SW design. In order to improve the SW design efficiency, applying a systematic design flow involving automated steps where possible, is crucial. Such a flow was proposed both for the HW-SW co-design for the full platform-level SW, and for the specifically demanding task of mapping the DSP functionality on the baseband processor. The latter flow particularly targets efficient SW, which in the end is crucial for low power operation of SDR-based terminals. For future wireless systems, the SW design challenges are expected to become more prominent still. SDR platform gross performances are still not sufficient to sustain the most recent highest throughput standards at the available energy budget. To bridge this gap, exploiting more parallelism and a systematic architecture-aware algorithm implementation approach are crucial. These new approaches should bring future generations of green SDRs in sight.
References 1. A. Dejonghe et al., Green reconfigurable radio systems: Creating and managing flexibility to overcome battery and spectrum scarcity, Signal Processing Magazine, May 2007. 2. B. Bougard, D. Novo, F. Naessens, L. Hollevoet, T. Schuster, M. Glassee, A. Dejonghe, and L. Van der Perre, A scalable programmable baseband platform for energy-efficient reactive software-defined radio, Crowncom, Mykonos, Greece, June 2006.
References
113
3. D. Novo, W. Moffat, V. Derudder, and L. Van der Perre, Mapping a multiple antenna SDMOFDM receiver on the ADRES coarse-grained reconfigurable processor, IEEE Workshop on Signal Processing Systems, Athens, Greece, Nov. 2005. 4. H. Cappelle, B. Bougard, A. Bourdoux, D. Novo, R. Vandebriel, and L. Van der Perre, Low power SDR baseband implementation for digital broadcasting receivers, IST Summit, Mykonos, Greece, June 2006. 5. B. Bougard, M. Li, D. Novo, L. Van der Perre, and F. Catthoor, Bridging the energy gap in size, weight and power constrained Software Define Radio: Agile baseband processing as a key enabler, ICASSP 2008, Las Vegas, Nevada, 2008. 6. http://www.catalyticinc.com. 7. M. Palkovic et al., Mapping of 40 MHz MIMO SDM-OFDM Baseband Processing on MultiProcessor SDR Platform, DDECS 2008 workshop, Bratislava, Slovakia, 2008. 8. B. Mei et al., Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling, IEEE Proceedings on Computers and Digital Techniques, Vol. 150, pp. 255–261, 2003. 9. www.coware.com. 10. D. Novo, B. Bougard, A. Lambrechts, L. Van der Perre, and F. Catthoor, Scenario-based fixed-point data format refinement to enable energy-scalable software defined radios, Munich, Germany, DATE 2008. 11. M. Li, B. Bougard, D. Novo, L. Van der Perre, and F. Catthoor, How to let instruction set processor beat ASIC for low power wireless baseband implementations, Anaheim, CA, DAC 2008. 12. www.sdrforum.org. 13. L. Van der Perre et al., ‘Efficient SW design and SW design efficiency: fuel for Software Defined Radios’, ISSSTA 2008, Bologna, Italy.
Chapter 6
Energy-Aware Cross-Layer Radio Management Exploit Flexibility for Saving Energy
Wireless standards will be implemented on Software Defined Radios (SDR) in the future. In mobile terminals, SDRs should couple a high functional flexibility to low-power operation. A two-step approach is advocated in this context. First, energy-scalability is introduced in the SDR design. Secondly, intelligent run-time cross-layer radio control is introduced to enable low-power operation, by exploiting this energy-scalability. In the previous chapters, we have introduced how to enable energy flexibility in the design phase of a reconfigurable radio system. In the present chapter, we will show how to exploit this flexibility for the sake of saving energy at system-level. The key idea is that significant energy savings are possible by continuously adapting the radio configuration to the actual environment conditions and performance requirements at run-time. We introduce a framework for such energy-aware cross-layer management of flexible radio systems in more details. The proposed framework is instantiated in a practical case study, showing its applicability in a realistic setup and the possibility to achieve significant energy savings.
6.1 Introduction Anything, anywhere, anytime: the query is not new. Still, offering ubiquitous wireless connectivity and seamless access to multimedia services is not yet a reality. A major need to realize this vision is Software Defined Radio (SDR): a reconfigurable radio implementation offering support for a large variety of wireless standards on the same hardware resources. Main advantages of such SDR implementations are higher flexibility (multi-purpose multi-standard platform, reprogrammable in the field) at lower cost (product development and manufacturing cost lowered thanks to better time-to-market, higher chipset production volume and lower number of components to integrate). In the coming years, the implementation of wireless standards on such reconfigurable radios is expected to become the only viable option [1, 2] This results from the combination of the increasing need for functional flexibility in communication systems (the variety of wireless standards L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
115
116
6 Energy-Aware Cross-Layer Radio Management 1995
High speed
2000
2G (digital) GSM CDMAone
2005
3G+
3G Multimedia GPRS EDGE
2010
3GPPLTE+
UMTS CDMA2000 802.16e
4G
Medium speed
research target
1G (analog) WIMAX
Low speed/ Stationary
2.4 GHz WLAN
5 GHz WLAN
UWB WPAN
Bluetooth
10 kbps 100 kbps 1 Mbps
High rate WLAN
10 Mbps
100 Mbps
60 GHz WPAN
1 Gbps
Fig. 6.1 The variety of wireless standards
to be supported is very large, and can be expected to still grow in the future) and the exploding cost of system-on-chip design (newer design technology nodes urge high-volume, multi-purpose and preferably widely programmable devices that can be easily updated) (Fig. 6.1). The major bottleneck to enable such reconfigurable implementations for handheld devices is energy efficiency. Due to its flexibility requirements, the SDR concept indeed intrinsically suffers from a significant energy penalty compared to dedicated hardware solutions. The major challenge in this context is thus to enable low energy SDRs matching the energy efficiency requirements of battery-powered handheld terminals and competitive with dedicated implementations. This is becoming a key concern: there exists a continuously growing gap between the available energy, resulting from battery technology evolution, and the steeply increasing energy requirements of emerging radio systems (Fig. 6.2). Technology scaling, platform improvements and circuit design progress are not sufficient for bridging this energy gap. There is a clear need for disruptive system-level strategies. Given the energy gap discussed above, a major challenge is to enable low energy reconfigurable radio implementations, suited for handheld multimedia terminals and competitive with fixed hardware implementations. To make such terminals a reality (i.e., to make reconfigurability rhyme with low energy), a two-step holistic approach is advocated [1], combining efficient design and efficient operation (Fig. 6.3). • First, effective flexibility and energy scalability are enabled in the design of the radio (digital platform and analog front-end). To enable the translation of
6.1 Introduction
117
Fig. 6.2 The energy gap is growing
Gap Energy requirement
Energy available in battery
Propagation conditions
Application requirements
Time
t
t
Cross-layeroptimized optimized Cross-layer Run-time Performance/ Run-time Performance/ Energymanager manager Energy Energy-scalable SDR SDR Energy-scalable Baseband Baseband
Front-end Front-end
Fig. 6.3 Energy scalable SDRs achieve low energy operation through cross-layer QoS and energy management
functional flexibility into energy scalability, the reconfigurable radio (algorithms, architectures, components and circuits) should first be designed accordingly. • Secondly, the flexibility and scalability are exploited by a cross-layer radio controller, which adapts at run-time the radio configurations to the system dynamics (application requirements, propagation conditions). This is shown to enable significant improvement in energy efficiency in the present chapter. The remainder of the present chapter is organized as follows. In Section 6.2, we briefly review how to enable energy scalability in the design phase of a reconfigurable radio. In Section 6.3, we describe how to save energy through approariate run-time cross-layer radio control. We start by providing an overview of state-ofthe-art energy saving techniques for wireless communication systems, emphasizing the need for a system-level approach. Next, we introduce our framework for
118
6 Energy-Aware Cross-Layer Radio Management
energy-aware cross-layer radio management and illustrate the application of these concepts on a realistic case study. Significant energy savings are shown to be possible by following our approach.
6.2 SDR Design Step: Enable Flexibility and Energy Scalability The first step of the proposed approach is to enable the translation of functional flexibility into energy scalability. The reconfigurable radio (considering algorithms, architectures, components and circuits) should therefore be designed accordingly. In the present section, examples of such energy-scalable reconfigurable radio architectures are provided.
6.2.1 Reconfigurable Analog Front-End For the reconfigurable analog front-end, architectures and circuits should first be designed to offer flexibility in carrier frequency, channel bandwidth and noise performance with minimal penalty in power consumption. Such a reconfigurable zero-IF analog front-end was presented in [4] (see Fig. 6.4), covering frequencies from 174 MHz to 6 GHz, and the RF specifications of the following standards: 802.11a/b/g/j/n, 802.15.1,4, 802.16e, UMTS-TDD/FDD, HSDPA, 3GPP-LTE, DAB/DMB/DVB-H. To enable the translation of functional flexibility into energy scalability, the reconfigurable radio should also be designed accordingly (Fig. 6.5). To that purpose, all building blocks in the RF front-end are equipped with configuration “knobs” that
Fig. 6.4 IMEC reconfigurable analog FE
6.2 SDR Design Step: Enable Flexibility and Energy Scalability
119
LO
Fig. 6.5 Configuration “knobs”: (1) tune performance requirements, (2) scale energy consumption
not only allows them to adjust their performance to the requirements of the considered standards, but also to scale their energy consumption to the actual requirements (i.e., energy scalable behavior). An example of such energy scalable component is the power amplifier (a significant power contributor) introduced in [5]. The proposed circuit flexibility enables significant trade-offs between the transmitter output power and linearity and the corresponding energy consumption. The resulting trade-off between the link signalto-noise-and-distortion ratio and the transmitter power consumption is shown in Fig. 6.6 for path losses ranging from 60 to 90 dB. We have also presented an energy scalable ADC architecture, with a perfectly linear trade-off between sampling rate and power consumption from kS/s to 10 s of MS/s, which at the same time achieves record figure of merit.
6.2.2 SDR Digital Platform Several approaches for SDR digital platform solutions have been proposed. However, they mostly do not meet the low power requirements for wireless terminal integration. One has in fact to carefully tradeoff flexibility and energy efficiency: flexibility should only be introduced where its impact on the total average power is sufficiently low or where it offers a broad range of control options that can be exploited effectively later in the control step (targeted flexibility). The required sub-functions of the wireless modem should be designed according to their nature (i.e., control or data processing) and flexibility/energy efficiency requirements. This calls for heterogeneous multi-processor system-on-chip (MPSoC) platforms. In [3], we proposed an SDR digital platform building on this concept of targeted flexibility. We refer the interested reader to Chapters 4 and 5 for a detailed description. When combined with adequate power-management techniques, this partitioning enables targeted flexibility with minimal impact on the average power consumption and standby time (Fig. 6.7).
120
6 Energy-Aware Cross-Layer Radio Management 1
Required SiNADlink for Rate [Mbps]:
6
9 12 18
PL 90dB
0.9
24
PL 80dB
36 48
54
PL 70dB
0.8
PL 60dB
Pdc [W]
0.7 0.6 0.5 0.4 0.3 0.2 −20
−15
−10
−5
0 5 10 SiNADlink [dB]
15
20
25
30
Fig. 6.6 Performance-energy trade-off enabled by energy scalable front-end implementation
Fig. 6.7 IMEC SDR digital platform
Further energy gains are achievable by also applying the energy-scalable design paradigm [7] when implementing the baseband processing algorithms. Energy-scalable algorithms enable trading off performance and execution energy. An energy-scalable equalization algorithm for SDR implementations of LTE receivers is for instance proposed in [8] The computation load is automatically adapted to the propagation conditions saving up to 60% of the computation without hampering performance (Fig. 6.8).
121
Cycle on SoA processor
3GPP channel response
6.2 SDR Design Step: Enable Flexibility and Energy Scalability
Time
Fig. 6.8 Energy scalable equalization
Throughput [Mbps]
150 100
Traditional 16bit implementation 0.5 dB degradation 2 dB degradation
50 0 0
5
10
15
5
10
15
20
25
30
35
40
20
25
30
35
40
Energy [nJ /bit]
40 30 20 10 0 0
SNR [dB]
Fig. 6.9 Dynamic word assignment in bb processing
Another interesting option is dynamic fixed-point word format assignment in the baseband processing. This enables to increase energy efficiency in situations requiring lower precision (i.e., lower signal-to-noise ratios) [9] (See Fig. 6.9).
122
6 Energy-Aware Cross-Layer Radio Management
6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy The second step of the proposed approach is to exploit this flexibility for saving energy at system-level, by controlling the reconfigurable radio system as function of the operation conditions [1, 12, 13]. The key observation here is that wireless communications systems typically face very dynamic conditions (propagation conditions and application requirements). By carefully adapting to these dynamics at run-time, building on the energy scalability enabled by reconfigurable radio, much energy can be saved with respect to conventional design. This corresponds to a major paradigm shift towards the design of systems providing the just required application performance while minimizing energy consumption, by continuously adapting at run-time the reconfigurable radio parameters to the actual conditions and requirements. This problem has to be addressed from a cross-layer perspective, as measuring performance requires taking into account the characteristics of the protocol stack, whereas optimizing energy expenditure assumes detailed knowledge of the low-level radio hardware.
6.3.1 State-of-the-Art Energy Management Techniques The need for improving the energy efficiency of wireless communication devices has already triggered a lot of research at various levels, from circuit to communication theory and networking protocols. The energy management problem, in its most general formulation, consists of dynamically controlling the system to minimize the average energy consumption under a performance constraint. Existing research can be classified into two categories. • Top-down approaches. Approaches that are intrinsically utilization- and hardware-aware but communication-unaware are categorized as top-down. The communicating device is treated as any electronics circuit and generalpurpose techniques like dynamic power management and energy-aware design are applied. The first technique is defined as dynamically reconfiguring an electronic system to provide the requested performance levels with a minimum number of active components and the minimum loads on those components [6]. The second technique can be defined as designing systems that presents a desirable energy – performance behavior for energy management [7]. • Bottom-up approaches. Approaches that are intrinsically communication-aware but hardware-unaware are categorized as bottom-up. They rely on the fundamentals of information and communication theory to derive energy-aware transmission techniques and communication algorithms. We find here for instance the transmission scaling techniques, which exploit the fundamental trade-off that exists between transmission rate/power and energy [10]. Network power management techniques also fall in this category, targeting the minimization of the transmission power under QoS constraints.
6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy
Bottom-up vs. top-down
Local energy /job
123
f(configuration)
Application Local time needed
Network
? ?
MAC MAC PHY PHY
Local energy /job
f(configuration)
Local time needed Fig. 6.10 Top-down and bottom-up approaches can be contradictory
Top-down and bottom-up approaches can easily result in a fundamental contradiction when applied independently. A good example is the conflict between transmission scaling at the PHY layer (bottom-up) and sleeping schemes at the MAC layer (top-down). Scaling tends to minimize transmission energy consumption by transmitting with the lowest power over the longest feasible duration, whereas sleeping tend to minimize the duty cycle of the radio circuitry by transmitting as fast as possible (Fig. 6.10). Clearly, the two techniques are contradictory when it comes to defining the optimal transmit rate and power allocation. The above reveals the need for a holistic cross-layer approach to carry out system-level energy management, jointly considering techniques to optimize the energy consumption in the different layers. This conclusion is strengthened by the simple observation that user-relevant performance metrics can only be measured on top of the whole protocol stack, close to the user, while energy consumption is mainly conditioned by the setting and policies within the lowest protocol layers, close to the hardware. The topic of cross-layer optimization has recently gained a lot of interest (see [11] and references therein for a broad coverage), however existing approaches still miss the advocated holistic view, hampering real-life deployment and sometimes leading to undesired side effects.
6.3.2 Cross-Layer Performance-Energy Optimization We now introduce an energy-aware cross-layer management framework dedicated to wireless systems [1, 12, 13]. The proposed approach targets the design of run-time XL control algorithms that enable energy-efficient operation of reconfigurable radio systems, i.e., that provide the required performance at minimum energy consumption.
124
6 Energy-Aware Cross-Layer Radio Management
An important goal is that the run-time complexity of these XL control algorithms should be as low as possible, to suit implementation constraints. Key ideas in this optics are design-time/run-time complexity partitioning (shifting complexity to the design-time phase) and Pareto characterization (enabling to very efficiently capture the system behavior at design-time and drastically simplifying the subsequent optimization steps). The proposed approach can be summarized in the following major steps.
6.3.2.1 Design-Time Optimization One has first to identify run-time controllable parameters (or knobs) that significantly impact performance and energy consumption at system-level. The control dimension settings in real implementations are discrete, inter-dependent and can have a non-linear influence. One has then to discretize the system run-time dynamics (e.g., environment conditions, application requirements) through the definition of relevant external variables that can be tracked at run-time. This then enables to model the system energy and performance behavior in function of the available knobs and external environment variables. A hierarchy of power-management specific abstraction layers is then introduced, to partition the global optimization problem into sub-problems that can be solved locally. A valid abstraction layer can be characterized by intermediate metrics that can be optimized locally without significantly hampering the optimality of the global optimization. It is hence possible to derive the optimal points along these local metrics (according to the Pareto multi-objective optimality criterion), and systematically prune away the non-optimal configurations. The repetition of this process up to the level of the global optimization problem finally results in what we will refer to as the XL run-time database in the sequel, i.e., a table linking the external variables, the available knobs and the system performance-energy behaviour. This consists in sufficient system characterization to conduct the global XL optimization at run-time. The resulting database structure is illustrated in Fig. 6.11.
6.3.2.2 Run-Time Operation The combination of the XL run-time algorithm and the XL database forms the XL controller, which is used to determine the optimal configuration for the whole system. The run-time control algorithm tracks the external variables, select current scenario and runs the optimization policy to select the optimal configuration points of each layer in order to provide the just-required performance at minimum energy consumption. This results in low complexity overhead thanks to the designtime/run-time partitioning, and especially the availability of a set of potential operating points that is fully characterized up-front and that is organized in monotonous Pareto curves.
6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy
125
Channel State Pathless
Request
Configuration
Cost
Resource
Metric
Configuration
Cost
Resource
Metric Pool
Configuration
Cost
Resource
Cost
Resource Fig. 6.11 Generic database structure capturing the design-time exploration
In order to facilitate implementation, the XL manager is structured in three units: the database unit, the algorithmic unit and the interface unit (Fig. 6.12). • The database unit stores the link the system configuration (knobs) and the corresponding energy and performance metrics. The functionality of the database unit is rather limited. It must provide the algorithm unit a metric pool based on the specified monitors. In a later stage it must return the knob values for a given configuration to the interface unit. • The algorithm unit is kept independent of the platform. It only interprets the energy and performance metrics of the database unit and the general system state information of the interface unit. Based on these the algorithm unit selects a configuration from the database unit and passes it to the interface unit. • The interface unit translates the platform specific parameters into more general system state information and provides it to the algorithm unit for processing. It also translates the resulting configuration (knob settings) from the algorithm unit into platform specific parameters.
126
6 Energy-Aware Cross-Layer Radio Management QoE manager
Algorithm unit
Selected configuration
Network cell state metrics
Access
Interface unit
e.g. Channel state, Requests, ...
Cost & resource metrics, Configuration
Database unit
Load database
e.g. baseband configuration, front-end configuration, ...
Fig. 6.12 Generic architecture of the XL controller
6.3.3 Instantiation in a Use Case In the present section, we will present a use case, which can be regarded as a proofof-concept of the proposed approach, showing that the proposed system-level energy management framework can be applied in practical systems and can result in significant energy savings. As illustrated in Fig. 6.13, the considered case study considers the uplink transmission of data traffic over an 802.11n MIMO (i.e., multiple antenna transmission) wireless link. Based on the proposed framework, we developed a XL controller for low-energy data transmission over 802.11n networks, which selects at run-time the radio configuration and communication mode (SIMO vs. MIMO, i.e. single antenna vs. multiple antenna transmission). The goal is to achieve the minimum energy consumption by jointly adapting the system configuration to the actual application requirements (i.e., average rate requirement) and environment parameters (i.e., channel state information). This will be shown to yield a significant energy efficiency improvement. Considering IMEC SDR implementation (see Chapters 3 to 5); a power analysis of the considered system is performed (see Fig. 6.14 for the modeling approach). The radio scalability model is developed in three phases: at component level, radio
App. requirements
6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy Distributed MAC : IEEE802.11a
Average rate requirement
DIFS Source
Data
RTS SIFS
SIFS
CTS
Destination
t Application requirements are dynamic
MAC / BB / FE configuration
Cross - layer controller
Channel gain
127
Channel state Information
t Propagation conditions are dynamic
SIFS
ACK
Multi - mode radio
TARGET: Provide QoS (reliability) at minimum energy (efficiency)
Fig. 6.13 XL multi-mode SDR control for low-energy data transmission over a 802.11a/n link
Configuration BB components: active and leakage power
Radio level
FE components: active power BB components: power states
Transmission power states
Scenario
Power [mW]
Component level
2000 1800 1600 1400 1200 1000 800 600 400 200 0
1 2 3 4 5 6 7 8 9 Transmission power states [-]
2% 8% 1%
System level
19%
Average power breakdown
56% < 1% 8% 2% < 1% 2%
Fig. 6.14 Modeling approach of system-level power consumption
level and system level. The radio-level modeling consists in the definition of the power consumption of the whole platform in its different radio power state (SLEEP, IDLE, SYNC, RX, TX), as illustrated in Fig. 6.15. In a given scenario, this can then enable system-level power characterization. Such an average breakdown can be obtained by doing MAC-level profiling of the activation of the different radio power
128
6 Energy-Aware Cross-Layer Radio Management 1800
BB FE
1400 1200
Power [mW]
TX MIMO 1 ADRES
TX MIMO 2 ADRES
1600
RX MIMO 2 ADRES
1000 TX SISO
800
RX MIMO 1 ADRES
600 RX SISO
400 200
IDLE
ISYNC
SLEEP
0
1
2
3 4 5 6 7 Transmission power state [-]
8
9
Fig. 6.15 WLAN: radio power state characterization
states, i.e., determining which fraction of the time they are active in a given communication scenario. As an example we provide this analysis considering WLAN uplink transmission (i.e., from the terminal to the access point). The considered PHY mode is SISO, 64QAM, code rate 3/4. The considered MAC protocol is the IEEE 802.11 DCF MAC protocol. Packets of 1,000 bytes are assumed. This shows that the biggest power consumers for WLAN data transmission are the power amplifier and the baseband processor. In the considered example (WLAN, SISO, 64QAM, rate 3/4, 1,000 bytes), the power amplifier and baseband processor respectively correspond to 57% and 15% of the average power consumption in uplink, and to 23% and 28% of the average power consumption in downlink (Fig. 6.16). The parameters that influence the system-level energy efficiency and performance can then be identified for the considered case. Next to independent component modeling, the target of XL optimization is obviously first to integrate all those models at system level, and in a second phase to take global optimal decisions, taking benefit from the modeling effort. The decision of the proposed controller relies on a design-time model characterizing the trade-off curve between consumed power and throughput for different channel attenuations, as a function of the possible radio configurations. This is illustrated in Fig. 6.17. Based on our system-level modeling strategy, we can obtain for each value of the total channel attenuation, a trade-off curve between transmit power and throughput. On each curve, blue and magenta stars denote optimal Pareto points, corresponding to SIMO and MIMO modes, respectively (i.e., single or multiple antenna transmission, respectively). Red stars denote the globally optimal points, including SIMO/MIMO mode switching and duty-cycling (i.e., making linear interpolation possible), leading to the red
6.3 SDR Control Step: Exploit Flexibility and Scalability for Saving Energy
129
Fig. 6.16 Uplink vs. downlink average power breakdown, WLAN, SISO, 64QAM, rate 3/4, 1,000 bytes
Fig. 6.17 Trade-off curves between consumed power and average throughput, for different channel attenuations
curves. Figure 6.18 presents the efficiency (nJ/bit) of the resulting optimal configurations, for the 90 dB path loss case. It illustrate how MIMO modes enable a higher
130
6 Energy-Aware Cross-Layer Radio Management
Wireless specific energy for a given throughput
35
Consumption [nJ/bit]
30 MIMO-64-QAM 2/3
25 20
MIMO-16-QAM 3/4 64-QAM 2/3 16-QAM 3/4
15 10 5 0
0
10
20
30
40
50
60
70
80
Throughput [Mbps] Fig. 6.18 Trade-off between energy efficiency per transmitted bit and average throughput, considering a 90 dB path loss
throughput, but at the cost of higher energy per bit. Average throughputs require the lowest energy per bit. Finally, achieving a very low throughput requires a higher energy per bit, given that the sleeping power starts to dominate the power breakdown due to the very low duty-cycling required. The obtained power-throughput curves for the different channel attenuations provide the sufficient design-time information for performing run-time XL optimization. This is illustrated in Fig. 6.19. The considered example is in fact smartMIMO, i.e., XL control for energy-efficient multi-mode (SISO Vs. MIMO) over a WLAN. As an example, we have generated a virtual scenario, with fluctuating demands of the application, fluctuating channel availability (more or less traffic from other users), and channel fading. The results of our XL policy are shown in Fig. 6.20. The reference case consists in a state-of-the-art link adaptation policy, which tends to maximize the throughput. A gain of 2.5 is made possible in the considered example, meaning an increase of 2.5 in battery lifetime. The large gain shows how important it is to exploit scalability in order to reduce the power consumption (Fig. 6.21).
6.4 Conclusions
131 Channel conditions
Application requirements
Path loss + fading [dB]
92
Application load [Mbps]
90 88 86 84 82 80 78
Design-time modeling (performance-energy)
76
0
5 10 15 20 25 30 35 40 45 50
20 19 18 17 16 15 14 13 12 11 10 0
5
10 15 20 25 30 35 40 45 50
Wireless trade-off, Scaldio CA, duty-cycling up to 100% 1.4 100 90 1.2 80 70 60
PWireless [W]
1 0.8
110
Dynamics/ constraints
Monitor
0.6 0.4 0.2 0 0
10
20
30 40 50 60 Throuput [Mbps]
70
Run-time control
80
Configure
Platform ctrl (ARM) Reconf AFE signal path
DFE tile
SyncPro
Reconf AFE signal path
DFE tile
SyncPro
Reconf AFE signal path
DFE tile
SyncPro
Shared AFE components
Flexible system
BB engine BW optimized scalable interconnect
BB engine FEC engine
Periph
L2 and HI
Fig. 6.19 XL control for multi-mode SDR
6.4 Conclusions We have introduced a radio management framework for the performance-energy optimization of multimedia wireless systems. This framework enables to minimize energy consumption, while providing the just required application performance. It was instantiated in different case studies, showing the applicability of our approach in realistic setups. Results have shown that substantial gains can be achieved.
132
6 Energy-Aware Cross-Layer Radio Management 0.9 Ref XL
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
5
10
15
20
25
30
35
40
45
50
Fig. 6.20 XL policy vs. SoA link management: a gain of 2.5 in battery lifetime is observed 500 FE BB
450 400 350 300 250 200 150 100 50 0
SoA
XL
Fig. 6.21 XL policy vs. SoA link management: a gain of 2.5 in battery lifetime is observed
References 1. A. Dejonghe, B. Bougard, S. Pollin, L. Van der Perre and F. Catthoor, Green reconfigurable radio systems: Creating and managing flexibility to overcome battery and spectrum scarcity, IEEE Signal Processing Magazine, special issue on Resource-Constrained Signal Processing, Communications, and Networking, Vol. 24, No. 3, May 2007. 2. G. Desoli and E. Filippi, An outlook on the evolution of mobile terminals, CAS Magazine, second quarter 2006. 3. L. Van der Perre, B. Bougard, J. Craninckx, W. Dehaene, L. Hollevoet, M. Jayapala, P. Marchal, M. Miranda, P. Raghavan, T. Schuster, P. Wambacq, F. Catthoor and
References
4.
5. 6. 7. 8.
9.
10.
11. 12.
13.
133
P. Vanbekbergen, Architectures and circuits for software defined radios: Scaling and scalability for low cost and low energy, ISSCC 2007, San Francisco, CA, Feb. 2007. J. Craninckx, M. Liu, D. Hauspie, V. Giannini, T. Kim, J. Lee, M. Libois, B. Debaillie, C. Soens, M. Ingels, A. Baschirotto, J. Van Driessche, L. Van der Perre and P. Vanbekbergen, A fully reconfigurable software-defined radio transceiver in 0.13 μm CMOS, ISSCC 2007, San Francisco, CA, Feb. 2007. B. Debaille, B. Bougard, G. Lenoir, G. Vandersteen and F. Catthoor, Energy-scalable OFDM transmitter design, Design Automation Conference (DAC), 2006. L. Benini et al., A survey of design techniques for system-level dynamic power Mgmt, IEEE Transactions on VLSI Systems, Vol. 8, No. 3, pp. 299–316, June 2000. A. Sinha and A. P. Chandrakasan, Energy-scalable system design, Transactions on VLSI Systems, Vol. 10, No. 2, pp. 135–145, April 2002. M. Li, B. Bougard, F. Horlin, M. Engels, L. Van Der Perre and F. Catthoor, Qualityenergy scalable chip level equalization for HSDPA, IEEE GLOBECOM Conference 2007, Washington, DC. D. Novo, B. Bougard, A. Lambrechts, L. Van der Perre and F. Catthoor, Scenario-based fixed-point data format refinement to enable energy-scalable software defined radios, Design Automation and Test in Europe (DATE 2008), Munich, Germany, pp. 722–727, March 2008. E. Uysal-Biyikogly, B. Prabhakar and A. El Gamal, Energy-efficient packet transmission over a wireless link, ACM/IEEE Transactions on Networking, Vol. 10, No. 4, pp. 487–499, Aug. 2002. S. Shakkottai, T. S. Rappaport and P. C. Karlsson, Cross-layer design for wireless networks, IEEE Communications Magazine, Vol. 41, No. 10, pp. 74–80, Oct. 2003. B. Bougard, S. Pollin, A. Dejonghe, F. Catthoor and W. Dehaene, Cross-layer power management in wireless networks and consequences on system-level architecture, EURASIP Signal Processing Journal, Special Issue on Advances in Signal Processing. S. Pollin, R. Mangharam, B. Bougard, F. Catthoor, L. Van der Perre, I. Moerman and R. Rajkumar, MEERA: cross-layer methodology for energy-efficient resource allocation for wireless networks, to appear in IEEE Transactions on Wireless Communications.
Chapter 7
Towards Cognitive Radios Getting the Best Out of the Radio and the Spectrum
In parallel with the need for cost- and energy-efficient reconfigurable radio implementations, there is as a matter of fact also a growing need to make next-generation terminals more intelligent and adaptive. Through appropriate radio management, these terminals should make flexible and efficient use of network/spectrum resources, so as to enable connectivity across complex and spectrum-constrained wireless networking environments. This has lead to the concept of cognitive radio. In this chapter we will preview on how SDRs are crucial to realize cognitive radios, and will explain the specific features that will need to be added. Also, we will give a glance on how further integration will make SDRs even more attractive for a wide range of wireless systems in the future.
7.1 Introduction Anything, anywhere, anytime: the query is not new. Still, offering ubiquitous wireless connectivity and seamless access to multimedia services is not yet a reality. Next-generation mobile terminals will have to face two main challenges to enable this vision: (A) the need for cost- and energy-efficient reconfigurable radio implementations, and (B) the need to make these terminals more intelligent (cognitive) and adaptive, through appropriate radio resource management.
7.1.1 The Need for Reconfigurable Radio Platforms As discussed in the previous chapters, a major enabler of the vision depicted above is Software Defined Radio (SDR): a reconfigurable and multi-purpose radio implementation, which should offer support for a large variety of wireless standards on a flexible and cost-effective way. As a matter of fact, the combination of the increasing need for functional flexibility in communication systems (the number of wireless L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
135
136
7 Towards Cognitive Radios
standards to be supported is large and can be expected to grow) and the exploding cost of system-on-chip design will make implementation of wireless standards on such reconfigurable radios the only viable option in the coming years. However, due to its flexibility requirements, the SDR concept intrinsically suffers from a significant energy penalty when compared to dedicated hardware solutions. The major challenge in this context is to enable low energy SDRs matching the energy efficiency requirements of battery-powered terminals and competitive with dedicated implementations. To make such terminals a reality, a two-step holistic approach is advocated, combining efficient design and efficient operation [1].
7.1.2 The Need for Intelligent and Adaptive Radio In parallel with the need for cost- and energy-efficient reconfigurable radio implementations, there is as a matter of fact also a growing need to make next-generation terminals more intelligent and adaptive. Through appropriate radio management, these terminals should make flexible and efficient use of network/spectrum resources, so as to enable connectivity across complex and spectrum-constrained wireless networking environments. Next-generation communication systems will indeed have to enable seamless connectivity across heterogeneous wireless environments (Fig. 7.1). A fundamental enabler in this direction is the increasing flexibility offered by emerging
Packet-based Core Network New radio access 3G
WMAN WLAN
Broadcast
Device
or Sensor/ actuator
WPAN
Fig. 7.1 Evolution to complex and heterogeneous wireless environments
GPS Galileo
7.1 Introduction
137
Fig. 7.2 Spectrum is over-allocated and under-utilized
Observation
Analysis/decision
Environment
Action
Fig. 7.3 The cognitive cycle
reconfigurable radio platforms on the one hand and by emerging communication networks on the other hand. By properly managing this flexibility, one can enable significant improvement, for instance in terms of connectivity or energy efficiency. Also, a current trend in wireless communication and in spectrum regulation is the evolution towards dynamic and open access to radio spectrum. This is motivated by the under-utilization of many licensed frequency bands and the continuously increasing demand for large data rates (Fig. 7.2). New paradigms for efficiently exploiting the spectrum will definitely be needed in this context. An evolution to more flexible use is needed according to regulatory bodies [5, 6]. This has lead to the concept of cognitive radio, first coined in [13], which is defined on the most generic way as follows [12] (Fig. 7.3): A radio that can autonomously change its transmission parameters based on interaction with and learning of the environment in which it operates. A second acceptation of cognitive radio (often referred to as opportunistic radio) is the following one [14, 15]: A radio that co-exists with legacy wireless systems using the same spectrum resources without significantly interfering with them.
138
7 Towards Cognitive Radios
The SDR solutions described in the present book are of course paving the way in this direction. However, several significant extensions are needed to enable cognitive radio terminals. Building on the results described in the previous chapters, our current research activities and achievements in this context will be described in the present chapter. The focus is more specifically on the following aspects: • New control functionality enabling cognitive/opportunistic use of network/ spectrum resources (Section 7.2) • New spectrum sensing functionality for better awareness of the networking environment (Section 7.3) • Emerging radio architectures, e.g., moving the digital/analog barrier closer to the antenna (Section 7.4).
7.2 New Control Functionality In the present section, we will provide an overview of two major acceptations of the cognitive radio concepts, namely a broad view and a more spectrum-centric view.
7.2.1 Cognitive Radio: Broad View A cognitive radio, broadly defined, is a radio that can autonomously change its transmission parameters based on interaction with and learning of the environment in which it operates [12]. The context is here seamless connectivity across heterogeneous wireless networks. Intelligence and context-awareness are in fact expected to significantly improve connectivity or energy efficiency. An important research issue is the design of cognitive controllers and on their proper implementation on SDR-based terminals (host processor + reconfigurable radio card). The target is to select at run-time the most appropriate access network and radio configuration, as a function of the wireless context and the application/user requirements. Due to the increasing complexity of emerging communication systems, an important aspect is that cognitive controller will have to somehow be able to learn their environments in order to take the most appropriate decision. As detailed in [25], partitioning cognition features between the central cognitive engine of a network and the cognitive engines of nodes plays the key role in classifying cognitive wireless networks (CWNs) (Fig. 7.4). In theory, various types of CWNs can be developed between two extreme cognition limits, which are absolute centralized and distributed (non-centralized) cognition. In the former limit, the central cognitive engine has full cognition capabilities, whereas the nodes possess limited cognition capabilities. Basically, the central cognitive engine can be considered the brain and the nodes the members. On the other hand, the structure is totally opposite in the absolute distributed cognition case, where full cognition capability
7.2 New Control Functionality
139
Cognitive Wireless Networks
Centralized
Distributed
Fig. 7.4 Cognitive wireless networks can be categorized in the way cognition features are divided between nodes and the cognitive engine
Cognitive Wireless Networks
Noncooperative
Partially Cooperative
Cooperative
Fig. 7.5 A second partition is based on the relationship amongst the nodes
is embedded into the cognitive engine of each node. Note that this type of network will always have a central cognitive engine to maintain network organization regardless of the cognition level of CR nodes. Indeed, the nodes will have more cognition capabilities as the technology advances. On the other hand, an ad hoc CR network can be considered a network type between these two cognition limits. CWNs can be classified into three types from the perspective of collaboration within a network. Collaboration can be between a node and the central cognitive engine of the network and/or between nodes. Since there is a natural collaboration between the central cognitive engine and nodes, the following classification is based on the collaboration between nodes (Fig. 7.5). The first type is the so-called cooperative network, in which all the nodes agree on performing predefined (e.g., cooperative spectrum sensing) or, ideally, arbitrary tasks collectively. The non-cooperative network is the second type, in which there are no collaborations between nodes. On the other hand, if a group of nodes agree and the rest of them do not agree to collaborate, this forms the third type of network, the so-called partially cooperative network. Ideally, CWNs can have the capability to transit from one type to another dynamically depending on the collaboration of nodes.
140
7 Towards Cognitive Radios
Cognitive Wireless Networks
Homogeneous
Heterogeneous
Fig. 7.6 A last partition divides the CWNs up into homogeneous networks and heterogeneous
Based on the node diversity criteria, CWNs are categorized under two categories, homogeneous and heterogeneous (Fig. 7.6). For a given geographical area or cell, all nodes are identical in homogeneous networks. On the other hand, if the given geographical area or cell consists of a mixture of different nodes, this is a heterogeneous network, as illustrated in Fig. 7.6. In this type of network, a mobile device can roam across the cell borders of other networks and interoperate with other wireless devices. In reality, the heterogeneity of CWNs can change dynamically as well. With the proliferation of wireless technologies, networks will become more and more heterogeneous. From an industrial point of view the most interesting networks are the heterogeneous networks as they enable the vision of seamless connectivity. Currently, the literature has begun tackling the problem of how to organize cognition in a network (centralized/distributed). Also small steps towards seamless connectivity are taken by first looking at handovers between two network instances. The problem is here to make the optimal choice based on a number of parameters (throughput, energy, delay or a combination). With the coming of SDR and the availability gradually more network choices will be available. Researchers are now analyzing the gains that can be made if one station is intelligent and the others are fixed. The problem becomes of course increasingly complex if this assumption is eliminated and we consider the fact that all nodes are more intelligent. We refer the interested reader to the references [16–24] for further details.
7.2.2 Cognitive Radio: Spectrum-centric View A more spectrum-centric definition of cognitive radio denotes a radio that co-exists with other wireless systems using the same spectrum resources without significantly interfering with them (also referred to as opportunistic radio). Generically stated, a spectrum-centric cognitive radio system provides a solution methodology for co-existence of heterogeneous devices/systems. It fits under the larger umbrella
7.2 New Control Functionality
141
Dynamic Spectrum Access
Dynamic exclusive use model
= dynamic spectrum allocation = dynamic licensing
Open sharing model
Hierarchical access model
= horizontal sharing
= vertical sharing
Fig. 7.7 Dynamic spectrum access can be categorized according to the relationship between the wireless technologies
of dynamic spectrum access. In the SoA no commonly accepted taxonomy has been established yet. Several taxonomies are presented in [25–27]. In the sequel, we consi der the taxonomy given in [27] (Fig. 7.7). • In the dynamic exclusive use model spectrum is allocated to users for exclusive use in a given region at a given time. This allocation should vary at a much faster pace than the current policy. • In some frequency bands (e.g., the ISM bands), it was proposed that unlicensed systems can coexist in a largely unregulated fashion. High-level etiquettes (like listen-before-talk) have to be established in order not to create a tragedy of the commons. This is the open sharing model. • Most recently, the hierarchical access model has been proposed. Here secondary users can opportunistically access the spectrum if they limiting the interference perceived by the incumbents. One can further subdivide this model as follows: Within the hierarchical access spectrum sharing model, two important modes can be further distinguished: • The spectrum underlay is already well known after the introduction of Ultra WideBand (UWB) Technology. Here, secondary users are allowed to access the spectrum of an incumbent, but severe limits are imposed on their transmitted power (since it assumes a worst-case scenario). This makes it only viable for short-range communication. • The final approach is called spectrum overlay (sometimes also called the interweave approach). Here, the secondary users are responsible for the interference management. They actively seek out white holes (unused licensed spectrum at a given time and a given place) and use these for their communication. However,
142
7 Towards Cognitive Radios
when an incumbent activates, the secondary users need to vacate the channel and search for a new opportunity. Spectrum-centric Cognitive methods can be introduced in all the approaches (like cognitive UWB, co-existence in ISM bands, primary vs. secondary user). With the introduction of the IEEE 802.22 standard, the spectrum overlay approach seems to be gaining attention. In the coming subsections, we provide some examples.
7.2.2.1 Example 1: Open Sharing Model Coexistence of Radios with Heterogeneous Capabilities The focus is here on situations where two different and competing network technologies coexist and interfere in the same band with equal license rights on this band. Interest in wireless technology has indeed been increasing significantly since its use in small portable devices or even sensors has enabled the development of a broad range of possible wireless applications. This has resulted in the development of various standards that often coexist in the same frequency bands, such as the ISM band. As an example of results in that direction, we can emphasize here a study on the cognitive coexistence of 802.15.4 sensor networks and 802.11 WLAN (Fig. 7.8: 802.15.4 and 802.11 co-existence in the ISM-band) [2,4]. Coexistence between heterogeneous networks such as wireless LANs and sensor networks with very different transmission properties often results in very asymmetric interference patterns. Indeed, the output power of 802.15.4 devices is typically as low as 0 dBm2, while the output power of 802.11g devices is typically 15 dBm or above. Also, 802.15.4 sensor networks are often designed to monitor the environment or buildings, and are typically very large. Because of their large size, sensor networks should be able to (re)configure themselves intelligently as to enable a robust operation with minimal management overhead. Although those sensor network applications are not demanding in terms of throughput, they require a high reliability and robustness against attacks or unknown events. In addition, energy consumption and sensor cost is an important feature, so should be considered in the algorithm design.
Fig. 7.8 802.15.4 and 802.11 co-existence in the ISM-band
7.2 New Control Functionality
143
Fig. 7.9 The considered scenario is a string topology of 802.15.4 nodes, where nodes report to a sink that is placed at one side of the string
In this study, the focus is on distributed adaptation strategies for large sensor networks that are affected by strong interference, which varies in time, frequency and space. Although nodes adapt locally by selecting their communication channel, we want them to converge to the same globally optimal channel to ensure network connectivity. We propose different distributed techniques with scanning (resulting in increasing power cost) or with learning (i.e., increased cognition). This implies in a first time modeling the true impact of harmful interference and interactions between heterogeneous systems. This implies in a second time to define strategies enabling a better co-existence. We show that learning schemes perform very well, in comparison with scanning schemes. Due to space limitations, we only present a subset of the obtained results. We look at the simple scenario presented in Fig. 7.9. The considered scenario is a string topology of 802.15.4 nodes, where nodes report to a sink that is placed at one side of the string. Interference is generated by WLAN devices and is dynamic in time and space. In this string topology packets are created in a random sensor and then forwarded to the sink at the end of the string. The end-to-end delays of the proposed algorithms are compared. Several distributed technique for dynamic channel allocation were then proposed, where the 802.15.4 devices learn or scan the impact of the 802.11 network, and adapt their behavior, so as to improve overall network throughput and energy consumption [2]. As represented in Fig. 7.10, the algorithms will try to seek a common channel for the sensors to make sure that the one-hop distance traveled is the largest possible (resulting in smallest delay to the sink) and that the switching energy is reduced. We compare the performance of several scanning or learning based algorithms with the optimal case. We refer the interested reader to [2] for further details.
7.2.2.2 Example 2: Hierarchical Access Model, Overlay Coexistence of Radios with Heterogeneous Licenses A significant body of research is dedicated to the concept of opportunistic radio networks, where radio nodes (secondary users) try to borrow licensed spectrum when
144
7 Towards Cognitive Radios
Fig. 7.10 Node distribution as a function of time for one of the proposed algorithms (blue is low, red is large number of nodes). While nodes converge to channel 14, the best channel, they occasionally jump to channel 8 to keep tracking that channel
unused by the licensed user (primary user). This fits the hierarchical access model for dynamic spectrum access, considering the overlay sub-case. The First International Standard based on this paradigm is the IEEE 802.22 standard [7–11]. The IEEE 802.22 standard was chartered in November 2004 to develop the PHY and MAC layers of a CR-based Wireless Regional Area Network (WRAN) for use by license-exempt devices in the TV spectrum. The goal is to bring broadband access to rural areas. According to the CR paradigm, 802.22 devices can use spectrum opportunities without causing any harmful interference to the incumbents. This requires the development of CR techniques, like sensing and measuring the spectrum to detect the presence or absence of incumbent signals. This standard envisions a communication range of 33 km at 4 W EIRP (or up to 100 km when power is not an issue). The network will be fixed and will use a pointto-multipoint architecture. Since the USA are the main drivers, the used frequencies range from 54 to 862 MHz, with channels of 6 MHz (i.e. the TV spectrum in the USA). An extended range from 41 to 910 MHz (channels of 6, 7 or 8 MHz) to meet additional international regulatory requirements remains a possibility [6]. The minimum data rates supported are 1.5 Mbps in downstream and 384 kbps in upstream, which is comparable to the DSL services. The IEEE 802.22 standard is a centrally-controlled MAC scheme: the base station jointly coordinates the access to the wireless medium (Fig. 7.11). When considering ad hoc/mesh networking environments, distributed MAC schemes can also be interesting in order to enable cognitive behavior. As an example of results in that direction we can emphasize the introduction of an energy-efficient distributed
7.2 New Control Functionality
145
Typical 802.22 CPE installation: Sensing antenna
GPS antenna
TX/RX WRAN Antenna
Fig. 7.11 Typical 802.22 CPE installation
Primary Tx
Primary Tx
Primary Rx
Primary Rx
Cognitive Radio System
Primary Tx
Primary Rx
Fig. 7.12 Scenario: distributed cognitive radio networks enabling opportunistic co-existence with PU
multichannel MAC protocol, enabling the opportunistic co-existence between primary and secondary users (Fig. 7.12) [3]. Since CR nodes need to be able to hop from channel to channel in order to fully utilize the spectrum opportunities, we believe distributed Multichannel MAC protocols to be key enablers for such CRenabled mesh networks. These protocols also have clear advantages over single channel MAC protocols: they offer reduced interference among users, increased network throughput due to simultaneous transmissions on different channels and a
146
7 Towards Cognitive Radios
5
Channels
4
3
2
inactive active
1 0
20
40
60
80
100
Time [s] 5
Channel
4
3
2
1
closed open
0
20
40
60
80
100
Time [s] Fig. 7.13 Activity of the PUs (top) and SU (bottom) for the reference scenarios. No PUs are present on the control channel (Channel 1)
reduction of the number of CRs affected by the return of a licensed user. The considered scenario is represented in Fig. 7.12: a distributed Cognitive Radio Networks enabling opportunistic co-existence with a primary user. This protocol is shown to improve performance by using spectral opportunities in licensed channels. The protocol was able to protect the primary users from secondary user interference. This is illustrated in Fig. 7.13, where the activity of the primary users (top) and secondary users (bottom) is represented for the reference scenario. The extra energy cost for scanning was shown to be marginal: for an active node, scanning cost was shown to contribute only 5% to the total energy cost [3].
7.3 New Sensing Functionality
147
7.3 New Sensing Functionality In view of current evolutions in radio spectrum regulation, spectrum sensing will become of paramount importance to enable new advanced radio systems. It enables a cognitive radio to analyze the spectrum over a range of frequencies and determine the usage of that spectrum by other users. Some additional signal features could also be senses, such as for instance energy, bandwidth, periodic features, identity of transmission source, and expected duration of spectrum usage. . . Once this is done, the CR can identify spectral “holes” in which transmission opportunities can be exploited. In its simplest form, the spectrum opportunity detection detects the presence of incumbents in a given channel. It can be considered as performing a binary hypothesis test: • H0 : absence of incumbents (i.e., spectrum opportunity) • H1 : presence of incumbents Key metrics in spectrum sensing are the following: • Probability of false alarm: PFA = Pr{decision = H1 |H0 } • Probability of missed detection: PMD = Pr{decision = H0 |H1 } • Probability of correct detection: PD = 1 − PMD If the probability of missed detection PMD is high, an opportunistic radio system is likely to cause harmful interference to an incumbent. A high probability of false alarm PFA results in a low spectrum utilization. Ideally, both probabilities should be low. However, since protecting incumbents is the primary concern in such a network, a higher probability of false alarm PFA is more tolerable than a higher probability of missed detection PMD . The Receiver Operating Characteristic (ROC) can be drawn for each detector if the SNR and the number of samples are known. Below, we present such a ROC. As illustrated in this figure, a smaller probability of false alarm PFA implies a larger probability of missed detection PMD and vice versa if the number of samples taken remains constant (Fig. 7.14). The criterion mostly used when designing spectrum opportunity detectors is the Neyman-Pearson criterion, which implies that the PD is maximized under a given PFA . Which design criterion is chosen depends on the implementation of the MAC layer. It is hence crucial to jointly design opportunity detection at the PHY layer and the opportunity exploitation in the MAC layer. There are many ways to detect the primary signal, which vary according to how much we know about the primary signal. One can even design detection with the help of routing or MAC information (e.g., by decoding beacons and learning the sleep and wake-up schedules of the incumbents). This is called network monitoring. When the complete waveform of the primary signal is known, we can design a matched filter detection, which correlates the received signal at the opportunistic radio with potential emitted signal of the incumbent. Drawbacks of this method are the need of a-priori knowledge of the incumbent signal and the need to perform
148
7 Towards Cognitive Radios
Probability of Detection 1- δ
1 0.9
δ
0.8
ε
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability of False Alarm ε
1
Fig. 7.14 The receiver operating characteristic presents the trade-off between false alarms and missed detection
timing and carrier synchronization (or even channel equalization). However, it provides good performance at a low computational cost. The simplest detector is the energy detector, which measures the received energy of the received signal over an observation time window. Drawbacks are the sensitivity to noise power uncertainty, the poor performance and the fact that it can not differentiate between the incumbent signal and interference. However, it doesn’t require a-priori knowledge of the signal and comes at low computational cost. The presented techniques are the two extremes of signal detection according to the level of a-priori information needed. We can also use partial information of the incumbent signal, like done in cyclostationary detection. This detection method exploits the cyclostationary of the incumbent signal, which is a property that a lot of wireless signals have. This technique is computationally complex, but it provides better performance than the energy detector and can differentiate between different types of signals. We refer the interested reader to references [28–32] for further details. Clearly, single device sensing is not sufficient to protect all incumbent receivers. The solution for these problems can be found in cooperative or distributed sensing, where the opportunistic radios share sensing information amongst each other. Obtaining information from the environment costs, in terms of throughput (when the channel should be quiet), energy (cost of sensing algorithm) and hardware (if dedicated sensing radio). This cost should be minimized. Leveraging on the scalable front-end, adaptive algorithms are designed to obtain scalable sensing solutions. The goal is to minimize the sensing cost based on run-time conditions instead of predefined SNR requirements that cannot guarantee detection performance unless very large worst-case margins for fading and noise uncertainty are used.
7.4 New Radio Architectures
149
7.4 New Radio Architectures New radio architectures will be required to support ongoing reconfigurability, cost, power and functional requirements in emerging wireless systems. We can for instance emphasize in this context an ongoing evolution towards digital RF transceivers, i.e., radio architectures that extend the scope of the (software-defined) modem beyond the sole baseband processing as in traditional SDR. This can be seen as “moving” the analog/digital and digital/analog converters closer to the antenna, operating at (up to) RF sampling rate. The main value is to ease the implementation of both the analog and the digital functionality into the same system-on-chip (SoC), designed in mainstream CMOS technology. This also enables reduced cost and power, and easier reconfigurability. In wireless receivers, the complete implementation of the Software Defined Radio (SDR) concept is as a matter of fact mainly limited by the analog front-end. In the task of downconverting the bandpass signal to baseband, or at least to low frequencies, front-ends are usually limited to a certain frequency band and power level. Building flexible front-ends is challenging because of the filtering and frequency synthesis, [34], [35]. The present need for flexibility usually leads to front-end duplications or variations for close bands. The increasing set of wireless standards has encouraged researchers to look further into digital techniques for the downconversion process. Nevertheless the digital processing at RF frequencies is limited by the sampling speed and analog bandwidth of Analog to Digital Converters (ADC). Additionally, mobile applications are also limited by the power consumption needed for a high sampling rate [36]. Instead of using Nyquist sampling in order to have a perfect reconstruction of the original signal, we can take advantage of its limited bandwidth. The generalized sampling theorem states that band-limited signals can be undersampled without information loss [37]. Quadrature bandpass sampling is a simple variation of a second order sampling architecture [38], and is simple enough to become a candidate for practical implementations. Another ongoing trend is the evolution towards SAW-less radio transceivers (Fig. 7.15). Most of the existing receivers use an external surface acoustic wave
Off-Chip
Digital CMOS Conversion To Digital Bit-stream
LNA
Digital Front-end
Signal Signal Conditioner Conditioner Signal Conditioner
Towards Digital RF transceivers
Towards SAW-less RF transceivers Fig. 7.15 Evolution towards digital RF and SAW-less transceivers
•Reduced cost and power •Scaling friendly •Easier reconfigurability •Reduced cost •Extended functionality
150
7 Towards Cognitive Radios
(SAW) filter at the LNA input. In most standards, the receiver must indeed satisfy severe filtering requirements on blockers, before the signal reaches the high-gain low-noise amplifier (LNA). Such a sharp filtering is not easily obtained with onchip inductors. This has several disadvantages. First, it increases the cost, especially in multimode and multiband applications where several of these filters are needed. Second, the insertion loss of the SAW filter degrades the receiver sensitivity. Third, it removes the flexibility of sharing the LNAs in multimode or multiband applications, and particularly in software-defined radios. For these reasons, it is interesting to look at on-chip filtering techniques to suppress these external filters [33]. Another challenge in waveform detection and spectrum sensing lies in the requirement to simultaneously sense and communicate. Indeed, the communication itself uses a relatively narrow portion of the spectrum while the sensing function has to cover a very broad frequency band, potentially several GHz of bandwidth. It is therefore likely that, in terms of implementation of the CR technique, we will see an evolution from purely SDR-based sequential waveform detection (the SDR will sequentially analyze different parts of the spectrum) to a combination of an SDR baseband/low frequency section with a wideband front-end and digitizer and DSP-based spectrum scanning. The large dynamic range due to signals, noise and interference pose a significant challenge for low power terminal implementation: wideband digitization has been the “holly grail” of SDR for years and significant progresses are needed in A/D converter technology. Successive Approximation Register and pipeline ADCs are likely candidates for sufficiently high sample rates and effective number of bits as typically required for SDR and CR (in the range of 108 –109 samples/s and 8–14 bits). Finally, an important aspect will be to have a proper SDR software architecture: indeed, on top of the hardware aspects, software is obviously of paramount importance when designing Software-Defined radios. A first key factor is the platform design and software mapping methodology, which should guarantee a competitive advantage in terms of time-to-market. A second key factor is here the software architecture, which should provide a proper platform abstraction for efficient software development and for effective control of the radio configuration.
7.5 Conclusion The cognitive radio concept was introduced. The SDR solutions described in the present book are of course paving the way in this direction. However, several significant extensions are needed to enable cognitive radio terminals. Building on the results described in the previous chapters, the following aspects were tackled: • New control functionality enabling cognitive/opportunistic use of network/ spectrum resources (Section 7.2) • New spectrum sensing functionality for better awareness of the networking environment (Section 7.3) • Emerging radio architectures, e.g., moving the digital/analog barrier closer to the antenna (Section 7.4).
References
151
References 1. A. Dejonghe, B. Bougard, S. Pollin, L. Van der Perre, and F. Catthoor, Green Reconfigurable Radio Systems: Creating and Managing Flexibility to Overcome Battery and Spectrum Scarcity, IEEE Signal Processing Magazine, special issue on Resource-Constrained Signal Processing, Communications, and Networking, Vol. 24, No. 3, May 2007. 2. S. Pollin, M. Ergen, A. Dejonghe, L. Van der Perre, F. Catthoor, I. Moerman, and A. Bahai, Distributed cognitive coexistence of 802.15.4 with 802.11 Crowncom 2006, Mykonos, Greece. 3. M. Timmers, A. Dejonghe, L. Van der Perre, and F. Catthoor, A Distributed Multichannel MAC for Cognitive Radio Networks with Primary User Recognition, presented at CrownCom 2007. 4. S. Pollin, Coexistence and Dynamic Sharing in Cognitive Radio Networks, to appear in Cognitive Wireless Communication Networks, ISBN: 979-0-387-68830.5. 5. FCC, Report of the Spectrum Efficiency Working Group. 6. Radio spectrum policy group opinion on Wireless Access Policy for Electronic Communications Services. 7. C. Cordeiro, K. Challapali, D. Birru, and S. Shankar, IEEE 802.22: An introduction to the first wireless standard based on cognitive radios, Journal of Communications, Vol. 1, No. 1, pp. 38–47, April 2006. 8. C. Cordeiro, M. Ghosh, D. Cavalcanti, and K. Challapali, Spectrum Sensing for Dynamic Spectrum Access of TV Bands, Proceedings of Crowncom, 2007, Orlando, FL. 9. K. Challapali, C. Cordeiro, and D. Birru, Evolution of Spectrum-Agile Cognitive Radios: First Wireless Internet Standard and Beyond, Proceedings of WICON, 2006, Boston, MA. 10. S. Sengupta, S. Brahma, M. Chatterjee, and S. Shankar N, Enhancements to cognitive radio based IEEE 802.22 air-interface, ICC 2007, Glasgow, UK. 11. C. Cordeiro, K. Challapali, D. Birru, and S. Shankar, IEEE 802.22: The first worldwide wireless standard based on cognitive radios, Proceedings of DySpan 2005, Baltimore, MA. 12. S. Haykin, Cognitive radio: Brain-empowered wireless communications, IEEE Journal on Selected Areas in Communications, Vol. 23, No. 2, pp. 201–220, 2005. 13. J. Mitola et al., Cognitive radio: Making software radios more personal, IEEE Personal Communications, Vol. 6, No. 4, pp. 13–18, Aug. 1999. 14. R. Brodersen, A. Wolisz, D. Cabri, S.M. Mishra, and D. Willkomm, A cognitive radio approach for usage of virtual unlicensed spectrum, CORVUS White Paper, July 2004. 15. Y. Xing, R. Chandramoulil, S. Mangold, and S.S.N, Dynamic spectrum access in open spectrum wireless networks, IEEE Transactions on Selected Areas in Communications, Vol. 24, No. 3, pp. 626–637, 2006. 16. C.-H. Lee and C.J. Yu, An Intelligent Handoff Algorithm for Wireless Communication Systems Using Grey Prediction and Fuzzy Decision System, Proceedings of the 2004 IEEE International Conference on Networking, Sensing & Control, Taipei, Taiwan, 2004. 17. C. Prehofer, N. Nafisi, and Q. Wei, A framework for context-aware handover decisions, Proceedings of PIMRC, Bejing, China, 2003. 18. L. Dimopoulou, G. Leoleis, and I.O. Venieris, Fast handover support in a WLAN environment: challenges and perspectives, IEEE Network, 2005, Vol. 19, No. 3, May-June 2005, pp. 14–20. 19. W. Zhang, J. Jaehnert, and K. Dolzer, Design and evaluation of a handover decision strategy for 4th generation mobile networks, Proceedings of VTC spring 2003, Jeju, Korea, April 2003, 2003. 20. Q. Wei, K. Farkas, P. Mendes, C. Prehofer, B. Plattner, and N. Nafisi, Context-Aware Handover Based on Active Network Technology, In Proceedings of the Fifth Annual International Working Conference on Active Networks (IWAN 2003). Lecture Notes in Computer Science, Springer Verlag, Kyoto, Japan, 2003. 21. K. Murray and D. Pesch, Intelligent network access and inter-system handover control in heterogeneous wireless networks for smart space environments, 1st International Symposium on Wireless Communication Systems, Mauritius, Sept. 2004.
152
7 Towards Cognitive Radios
22. O. Ormond, J. Murphy, and G.-M. Muntean, Utility-based intelligent network selection in beyond 3G systems, Proceedings of ICC, 2005, Seoul, Korea. 23. M. Kassar, B. Kervella, and G. Pujolle, Architecture of an intelligent inter-system handover management scheme, Future Generation Communication and Networking, Vol.1, pp. 332–337, 2007. 24. L.D. Chou, W.C. Lai, C.H. Lin, Y.C. Lin, and C.M. Huang, Seamless handover in WLAN and cellular networks through intelligent agents, Journal of Information Science and Engineering, Vol. 23, No. 4, pp. 1087–1101, 2007. 25. H. Celebi and H. Arslan, Utilization of location information in cognitive wireless networks, IEEE Wireless Communication Magazine-Special Issue on Cognitive Wireless Networks, Vol. 14, No. 4, pp. 6–13, August 2007. 26. M. Buddhikot, Understanding dynamic, spectrum access: Models, taxonomy and challenges, Proceedings of Dyspan 2007, Dublin, pp. 649–663, 2007. 27. Q. Zhao, A survey of dynamic spectrum access. Signal processing, networking and regulatory policy, Signal Processing Magazine, May 2007. 28. W. Gardner, Exploitation of spectral redundancy in cyclostationary signals, IEEE Signal Processing Magazine, Vol. 8, No. 2, pp. 14–32, April 1991. 29. J.G. Proakis, Digital Communications, 4th ed., McGraw Hill, New York, 2001. 30. M.I. Skolnik, Radar Handbook, McGraw-Hill, New York, 1990. 31. F. Hlawatsch and G.F. Boudreaux-Bartels, Linear and quadratic time-frequency signal representation, IEEE Signal Processing Magazine, Vol. 9, No. 2, pp. 21–67, April 1992. 32. J. O’Neill and W.J. Williams, A function of time, frequency, lag, and Doppler, IEEE Transactions on Signal Processing, Vol. 47, No. 3, pp. 789–799, March 1999. 33. H. Darabi, A blocker filtering technique for SAW-less wireless receivers, IEEE Journal of Solid-State Circuits, Vol. 42, No. 12, December 2007. 34. J. Craninckx and S. Donnay, 4g terminals: How are we going to design them? Proceedings Design Automation Conference, Anaheim, CA, pp. 79–84, 2003. 35. N. Vun and A.B. Premkumar, Adc systems for sdr digital front-end, Proceedings of the Ninth International Symposium on Consumer Electronics, Macau, pp. 359–363, 2005. 36. P.B. Kenington and L. Astier, Power consumption of a/d converters for software radio applications, IEEE Transactions on Vehicle Technology, Vol. 49, No. 2, pp. 643–650, March 2000. 37. J.G. Proakis and D.G. Manolakis, Digital signal processing: Principles, algorithms, and applications, Prentice-Hall, Upper Saddle River, NJ, 1996. 38. R.G. Vaughan, N.L. Scott, and D. Rod White, The theory of bandpass sampling, IEEE Transactions on Signal Processing, Vol. 39, No. 9, pp. 1973–1984, September 1991.
Chapter 8
Close This is not the End, it’s Just a Beginning
8.1 A Last Chapter In the last chapter of this book, we first summarize its major conclusions. Next we touch upon on some challenges ahead. The scaling towards new applications and new technologies will require Green SDRs to update and upgrade. A short problem statement is given on the specific challenge residing in the multi-mode antenna interface. Finally, we end this chapter and book with some closing remarks.
8.2 Major Conclusions The wireless standards’ scene and its evolution strengthen the need for functional flexibility in future radios. Multi-mode terminals supporting an increasingly large variety of standards (cellular, WLANs, WMANs, WPANs), are subject to a cost increase that is addressed by more flexible radio interfaces. Energy efficiency, however, is the main obstacle to successfully deploy such reconfigurable radios. Green SDRs will be essential, to save on crucial and scarce resources: energy and spectrum. This book has tried to provide some insight in the trends asking for green SDRs in Chapter 1. Following, an introduction to SDRs in general, and the concept and content of this book more specifically were given. The essential ingredients of Green SDRs are further elaborated on. Specifically, design solutions and approaches for energy-scalable SDRs, both for the radio frontend (in Chapter 3) and the digital baseband (hardware in Chapter 4 and software in Chapter 5), were explained. A direct conversion transceiver with very flexible and reconfigurable building blocks has been presented for the front-end. Concerning the SDR baseband platforms, intelligent partitioning on the platform level, and architecture-algorithm co-design on the component level, are essential to make flexibility and low power a winning combination. For the development of efficient SW, a methodological approach is advocated. L. Van der Perre et al., Green Software Defined Radios, Series on Integrated Circuits and Systems, c Springer Science+Business Media B.V. 2009
153
154
8 Close
Intelligent (cross-layer) control is introduced as the key to achieve energy efficient operation and seamless connectivity in Chapter 6. Last but not least Chapter 7 of this book focuses on cognitive radios, opening a new wireless order. These radios build on SDR platforms, enhanced with a sensing engine and intelligent control. We refer to the conclusions sections of the previous chapter concerning the status today of Green SDR technologies.
8.3 Challenges Ahead 8.3.1 Scaling to Next Generation Applications and Technologies We may expect the demand for wireless connectivity to keep on growing in the next 5–10 years: users want to get ever faster wireless internet access, HDTV will become the standard, Gbyte memories will be pervasive asking to get connected wirelessly. The impact on the requirements of wireless modems (both hardware and software) will be high: Up to several 100 Mbit/s at high mobility, and Gbit/s in quasistationary conditions, will need to be sustained. Multiple simultaneous streams from different standards will need to be supported. Cognitive behavior will put specific requirements on the platform, to enable spectrum sensing. Moreover, it will induce a dramatic increase in the complexity of the software for data processing and control specifically. Scaling to smaller technologies will be needed to cope with the increasing complexity requirements. As introduced in the beginning of this book, new problems arise, and will need to be tackled on design and system level as well in the future.
8.3.2 Focus on Multi-band Antenna Interface Challenge One of the unsolved problems is the handling of many frequency bands in the antenna interface (of any type of radio) in a cost effective way. The antenna interface is currently implemented from discrete (individually packaged) components integrated on a PCB. For every frequency band a set of components is required. When building a multi-standard radio, this leads to a high-cost solution. In fact, important benefits of SDRs may face annoying limits if the concepts of reuse and flexibility can not be extended into the antenna interface section of the radio. Finding solutions which a low cost and a small form factor, eventually flexible, solutions for a multi-band antenna interface therefore is one of the main challenges for the future of SDRs. Specific paths worth exploring in order to tackle the multi-band antenna interface challenge include:
8.4 Closing Remarks
155
1. Packaging solutions increasing integration and reducing cost: The antenna interface consists of all components between RFIC and antenna, concretely these are the filtering and power amplification functions of the radio. These functions are implemented using components in many different technologies: GaAs p-HEMT switches, Acoustic filters (SAW and others), passives (inductors, capacitors), BiCMOS (PA), etc., hence need to be heterogeneously integrated on substrates to build the (sub)systems. The choices in these packaging options have a large impact on the overall size and cost of the solution. 2. New devices/components: To increase the functionality the development of new components is an attractive research path. Various examples are RF-MEMS switches, new types of acoustic filters, wide bandwidth Power amplifiers using GaN devices. In other words, the ‘new device’ should be seen in the widest sense: it can be any component at any level of the hierarchy in the antenna interface that is implemented in a new, disruptive way. This disruptive way can be a new material, a new device design and/or type using known materials, a new circuit design using known devices, a new functional sub-system by differently assembling components into a system. 3. Innovative radio architectures: A top-down look at the radio architecture without the historical legacy will lead to a better, lower-cost system. Similarly, it is also the place to do a bottom-up approach to build the lowest-cost solution given the components available and work around problems that some of these components might still have, such as for instance building a reliable system with unreliable components. This is essentially a holistic approach to the problem, to solve the issues in the right way for the system.
8.4 Closing Remarks The ‘Green challenges’ can be expected to escalate in the coming years and decades. The transmitted data volume increases approximately by a factor of 10 every 5 years, which corresponds to an increase of the associated energy consumption by approximately 16–20% per year. Currently, 3% of the world-wide energy is consumed by the ICT infrastructure which causes about 2% of the world-wide CO2 emissions (which is comparable to the world-wide CO2 emissions by airplanes or one quarter of the world-wide CO2 emissions by cars). If this energy consumption is doubled every 5 years, serious problems will arise. Therefore, lowering energy consumption of future wireless radio systems needs to become a priority target for all wireless designs, not only battery-powered (handheld, mobile) radios or devices. Concerning availability of spectrum, a similar analysis applies. It is clear that the solutions proposed in this book make a beginning, yet certainly the quest for more energy efficient solutions has definitely not reached an end! Yet at the close of this book, we would like to conclude on a positive note. The following quote, from an honorable poet and a fitting song, tends to apply: ‘Every new beginning comes from some other beginning’s end’ (Seneca, Green
156
8 Close
Day ‘Closing time’). Creative researchers and experts from multiple disciplines can by cooperating bring disruptive innovative solutions. New paradigms for efficient spectrum usage are in sight, and higher frequencies (60 GHz and beyond) can be explored. Alternative energy sources should be considered and where possible designed for. E.g. solar energy may be used for small energy efficient base stations or even terminals in the future. Important, both as a target and in the quest towards this target, is to keep the communication going!
Index
A ADC, 12, 17, 27, 51–54, 59, 62, 69, 76, 119, 149, 150 antenna interface, 39, 41, 107, 153, 154, 155 ASIP, 76, 77, 80, 89, 93 C CMOS circuits, 30, 41, 45, 46, 51, 54, 70 cross-layer optimization, 123 cognitive radios, 12, 13, 19, 20, 22, 25, 135, 154 D design flow, 29, 97, 100, 101, 112 digital platform, 116, 119, 120 digital architectures, 74, 80, 88 F FEC, 65, 68, 72, 88, 89, 91, 93, 108, 131 L low energy, 11–13, 22, 24, 65, 66, 116, 126, 136 LNA, 39–45, 59, 61, 62, 149, 150 low power design, 27, 51, 65, 68, 80, 109, 115
M mobile terminals, 11, 23, 57, 109, 115, 135 P processor, 5, 12, 17–19, 65, 67, 68, 70, 73, 74, 76, 77, 78, 80–82, 84–87, 89–94, 98, 101, 103, 110–112, 128, 138 R RF front-end, 118 reconfigurable radio, 4, 10, 11, 17, 18, 21, 22, 25, 115–118, 122, 123, 135–138, 153 S silicon technology, 70 spectrum usage, 147, 156 Software Defined Radio, 1, 2, 4, 7, 12, 13, 15–18, 21, 22, 27, 30, 36, 61, 62, 93, 101, 115, 135, 149, 150 synchronization, 66, 69, 74–77, 79, 89, 93, 148 SW design, 70, 97, 98, 109, 111, 112 W Wireless communication, 1–3, 12, 16, 20, 23, 25, 81, 93, 117, 122, 137
157