Design of building blocks of a high rate wireless transceiver for short range communications
Design of building blocks of high rate wireless transceiver

for short range communications

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

By:

Kambiz Hadipour Abkenar
Tarbiat Modares University
M.Sc. EE., 2010

April, 2014
To my mom and dad and to my beautiful sister Sanaz

to all my teachers

&

to my friends who have raised me up when I was down and who have taken me down
when I was too high!
This dissertation is approved for recommendation to the Graduate Council.

Prof. Francesco Svelto
Thesis Director

Prof. Andrea Mazzanti
University of Pavia

Prof. Carlo Samori
Polytechnic University of Milan

Prof. Luca Larcher
University of Modena and Reggio Emilia
Increased memory capacity and processing power in mobile devices has created a need for radios that can transmit data at multi Gbps. Millimeter-wave circuits are entering the consumer electronics to fulfill this goal. They have the potential to be used for multi Gbps data communication, high definition wireless video transmission, medical & security imaging and chip to chip communication. Among the aforementioned applications design and analysis of a high data rate mm-wave short range link is discussed in this thesis.

This dissertation can be divided into two parts. In the first part, system level analysis and link budget calculations for such a system are carried on. It begins with an overview on various possible solutions to implement the link and continues by introducing the proposed system and its analyses and calculations. A non-coherent modulation with power detection has been devised for the transceiver system for the sake of simplicity and to reduce the amount of consumed power and area. The transceiver system is able of wireless error free 5Gbps data transfer over a maximum communication distance of 14cm.

In the second part, design of building blocks of the proposed transceiver for robust operation at 50GHz is discussed at circuit level. Different blocks of this mm-wave transceiver are investigated and design techniques are proposed to enhance their performance. Among these building blocks a major portion of the thesis is dedicated to design and analysis of a three-stage wideband mm-wave low noise amplifier aimed at maximum gain-bandwidth (GBW) product. It is based on stacked common source gain stages in a current-sharing configuration. To increase the bandwidth third-order wideband passive networks connect the cascaded stages. In addition, stagger-tuning technique is employed to obtain flat wideband gain. The LNA fabricated in 28nm Bulk CMOS technology node, achieves a gain of more than 22dB over 30GHz of bandwidth, corresponding to a GBW of 391GHz. To the authors’ knowledge this is the highest GBW reported for an amplifier at mm-wave, even though the chip is working in SS corner.

A relatively short discussion on design of other building blocks of this 50GHz transceiver follows the chapter on the LNA. Measurement results are also presented and some suggestions to ameliorate the achieved results together with some design guidelines for future works are also presented.
ACKNOWLEDGEMENTS

This dissertation is the culmination of collaborations with a great number of people and would have not been possible without their help and support whether explicitly mentioned here or not. I am extremely happy for having the chance to know them, work with and learn from them.

First and foremost, I would like to thank my supervisor Professor Francesco Svelto for his guidance throughout my research, for all his kind support and for the invaluable advices over the past four years. I would also like to announce my deepest gratitude to Professor Andrea Mazzanti whose adviceses on technical subjects were always insightful and indispensable. I have learned many things from both Frank & Mazza and I am really in debt for all they have done to me.

I should also thank the members of special committee: Prof. Carlo Samori from Polytechnic University of Milan and Prof. Luca Larcher from University of Modena and Reggio Emilia. Not only I had the pleasure to have them on my qualification exam committee but also I had the opportunity to have their helpful suggestions and brilliant comments during thesis review.

I would also like to especially thank Andrea Ghilioni and Enrico Monaco! The work detailed in this thesis would not have been possible without their significant help and generous assistance. I am really grateful for all the useful discussions and for the countless hours of help on design, layout and testing of the transceiver.

Over the past four years, I have had the privilege of meeting some pretty incredible people to whom I am really thankful. Thank you Matteo Bassi, Junlei Zhao, Fabrizio Loi, Ugo Decanis, Marco Sosio, Dan Li, Enrico Mammei, Dario Bianchi, Lorenzo Iotti and all my other friends at Microelectronics group and the Analog Integrated circuit laboratory.

During my stay at Pavia, there have always been some invaluable friends who made this period much more memorable for me. I would like to name some of them just to show how important they are to me. Many thanks to Hadi Heidari, Sanaz Kianoush, Sara Khosravi, Reza Baghbanmanesh, Saeid Shamas, Mahdieh Atlasi, Najmeh Rezaee, Erika Covi, Thanasis Kiouseloglou, Mahboobeh Kashef, Ali Hosseini, Fatemeh Aadelinia, Mohammad Javad Khoshgoftar and Peyman Rafiee for the joy and accompany they have provided to me.
Last but definitely not least! Any achievement in my life would have not been possible without the constant love and support of my family throughout many obstacles encountered during my life. They have provided the perfect environment where my character and intellectual curiosity were cultivated. Their help had been fundamental in the completion of not only this work but all my endeavors. With no doubt, I would not be where I am today without them, without their kind support and without the foundation that they have provided throughout my life. They have sacrificed so much; more than that can be explained here.
LIST OF ABBREVIATIONS

• A
ADC: Analog to Digital Converter
AM: Amplitude Modulation
AMOS: Accumulation MOS
ASK: Amplitude Shift Keying
AV: Audio/Video
AWG: Arbitrary Waveform Generator

• B
BER: Bit Error Rate
BIST: Built In Self Test
BPSK: Binary Phase Shift Keying
BWER: Bandwidth Enhancement Ratio

• C
CDA: Cascaded Distributed Amplifier
CMOS: Complementary Metal Oxide Semiconductor
CMRR: Common Mode Rejection Ratio
CMS: Common Mode Signaling
CPW: Coplanar waveguide
CS: Common Source

• D
DAC: Digital to Analog Converter
DiCAD: Digitally Controlled Artificial Dielectric
DK: Design Kit
DR: Dynamic Range
DSP: Digital Signal Processing
DUT: Device Under Test

• E
ED: Envelope Detector
EM: Electromagnetic
ENR: Excess Noise Ratio
EO: Electrical Optical
ESD: Electrostatic Discharge

• F
FA: Feedback Amplifier
FET: Field Effect Transistor
FOM: Figure Of Merit
FMCW: Frequency Modulated Continues wave
FSK: Frequency Shift Keying

• G
GBW: Gain-Bandwidth
GCPW: Grounded coplanar waveguide
GSG: Ground Signal Ground

• H
HD: High Definition
HFSS: High Frequency Structure Simulator
HSI: High Speed Interface

• I
IEEE: The Institute of Electrical and Electronics Engineers
IF: Intermediate Frequency
I/O: Input Output
ISI: Inter Symbol Interference
ITRS: International Technology Roadmap for Semiconductor
• **J**

• **K**

• **L**
  LA: Limiting Amplifier
  LAN: Local Area Network
  LO: Local Oscillator
  LNA: Low Noise Amplifier
  LRP: Low Rate PHY
  LRR: Long Range Radar

• **M**
  MAC: Medium Access Control
  MAG: Maximum Available Gain
  MAN: Metropolitan Area Network
  MMIC: Monolithic Millimeter-wave Integrated Circuits
  MN: Matching Network
  MOM: Metal Oxide Metal
  MS: Microstrip

• **N**
  NF: Noise Figure
  NLOS: Non-Line Of Sight
  NRZ: Non Return to Zero

• **O**
  OE: Optical Electrical
  OFDM: Orthogonal Frequency Division Multiplexing
  OOK: On-Off Keying

• **P**
  PA: Power Amplifier
  PCB: Printed Circuit Board
  PHY: Physical Layer
  PLL: Phase Locked Loop
  PM: Phase Modulation
  PN: Phase Noise
  PRBS: Pseudo Random Bit Sequence
  PSRR: Power Supply Rejection Ratio
  PSS: Periodic Steady State
  PVT: Process Voltage Temperature

• **Q**
  QAM: Quadrature Amplitude Modulation
  QPSK: Quadrature Phase Shift Keying

• **R**
  RF: Radio Frequency
  RX: Receiver

• **S**
  SC: Single Carrier
  SNR: Signal to Noise Ratio
  SOC: System On Chip
  SRR: Short Range Radar
  SS: Slow-Slow corner

• **T**
  TL: Transmission line
  TRA: Triple Resonance Amplifier
  TRX: Transceiver
  TT: Typical Corner
  TX: Transmitter
• U
UWB: Ultra Wideband

• V
VCO: Voltage Controlled Oscillator
VGA: Variable Gain Amplifier

• W
WPAN: Wireless Personal Area Network
# TABLE OF CONTENTS

LIST OF FIGURES ................................................................................................................................. IX

## Chapter 1 ............................................................................................................................................. 1

- Introduction ........................................................................................................................................ 1
- CMOS for Millimeter-wave design .................................................................................................. 2
- Envisioned Applications .................................................................................................................. 3
- Design Challenges ............................................................................................................................ 9
- Thesis outline ................................................................................................................................... 11

## Chapter 2 ............................................................................................................................................. 12

- Mm-wave Transceiver Design ........................................................................................................ 12
  - Introduction .................................................................................................................................. 12
  - IEEE 802.15.3 Standard .............................................................................................................. 14
  - Design Requirement ...................................................................................................................... 16
  - State of the art mm-wave data links ............................................................................................ 18
  - The architecture employed in this work ..................................................................................... 28
  - 50GHz Link Budget ...................................................................................................................... 32
  - Design Methodology ..................................................................................................................... 41
  - Chapter Summary ......................................................................................................................... 41

## Chapter 3 ............................................................................................................................................. 43

- LNA: High Gain-Bandwidth at Low Power ...................................................................................... 43
  - Introduction .................................................................................................................................. 43
  - General considerations on mm-wave LNAs .................................................................................. 44
  - An overview on the gain-bandwidth enhancement techniques .................................................. 47
  - Design examples from state of the art ........................................................................................... 60
  - Lumped vs. distributed passive components .............................................................................. 65
  - The proposed LNA ....................................................................................................................... 71
  - Stability Analysis of the amplifier ................................................................................................. 84
  - Chapter Summary ......................................................................................................................... 87

## Chapter 4 ............................................................................................................................................. 89

- Transceiver Building Blocks .......................................................................................................... 89
  - Introduction .................................................................................................................................. 89
  - Voltage Controlled Oscillator ...................................................................................................... 90
  - Power Amplifier ............................................................................................................................ 99
  - Single ended input differential output Low noise amplifier ......................................................... 103
  - Baseband Circuits ....................................................................................................................... 106
Chapter Summary ................................................................. 117

Chapter 5 .................................................................................. 118
Measurement Results ............................................................... 118
  Introduction ............................................................................. 118
  DC Measurements ................................................................. 118
  Low Noise Amplifier Measurement ........................................ 119
  Transceiver Measurements .................................................... 125
  Conclusion ............................................................................ 129

Conclusion ................................................................. 130
LIST OF FIGURES

Fig. 1. 1: Penetration of microelectronics to every aspect of our life ........................................ 1
Fig. 1. 2: Employing mm-wave radars for driving assistance ....................................................... 4
Fig. 1. 3: mm-wave imaging for security ..................................................................................... 5
Fig. 1. 4: Employing mm-wave technology in chemical sensors ................................................... 7
Fig. 1. 5: Utilizing mm-wave frequencies to realize wireless LAN or MAN ................................... 7
Fig. 1. 6: Wireless vs. wired chip to chip Communication ............................................................... 8
Fig. 1. 7: Device customization, optimization, measurement and modeling are key steps 
for mm-wave design .................................................................................................................. 10

Fig. 2. 1: Evolution of minimum channel length and maximum operation frequency of 
CMOS transistors over the years based on ITRS data ............................................................. 12
Fig. 2. 2: Evolution of communication distance versus carrier frequency (a) data rates 
versus available bandwidth (b) .................................................................................................... 13
Fig. 2. 3: Development of wireless standards and the increase in achievable data rates over 
time ............................................................................................................................................ 14
Fig. 2. 4: Channel plan of IEEE 802.15.3c Standard .................................................................... 16
Fig. 2. 5: Direct-Conversion Transceiver for IEEE802.15.3c ..................................................... 19
Fig. 2. 6: Measured spectrum for QPSK mode at the output of transceiver in ......................... 20
Fig. 2. 7: Transceiver architecture in .......................................................................................... 21
Fig. 2. 8: Conventional up conversion transmitter (a), Direct digital modulation transmitter 
(b) ................................................................................................................................................ 23
Fig. 2. 9: 60 GHz direct modulation BPSK transceiver architecture introduced in ............... 23
Fig. 2. 10: Transceiver architecture in Ref .............................................................................. 24
Fig. 2. 11: Transceiver architecture presented in ....................................................................... 25
Fig. 2. 12: An mm-wave intra-connect solution ....................................................................... 26
Fig. 2. 13: Comparison of achievable BER for coherent & non-coherent OOK detection 
vs. SNR ....................................................................................................................................... 29
Fig. 2. 14: BER as a function of SNR for 4 different modulation schemes .................................. 30
Fig. 2. 15: Realization of OOK modulation by switching the oscillator (a) and by 
switching the PA (b). The second solution is employed to realize the transmitter .................. 31
Fig. 2. 16: The proposed receiver architecture ......................................................................... 31
Fig. 2. 17: Block diagram of the receiver chain for noise figure calculations: Overall 
system (a), Refering \( n_{LNA} \) to the input of the system (b), simplified view (c) .................... 33
Fig. 2. 18: Required SNR at the input of the receiver vs. LNA Gain. In this plot, LNA NF 
is considered to be 10dB, bit rate is 10Gbps, center frequency is 50GHz, \( \alpha_2=0.35 \), 
\( \sigma_n=517e^{-9} \) and the required SNR at the output of the receiver is 17dB ......................... 37
Fig. 2.19: Input signal power versus the gain of the LNA. LNA’s NF is considered to be 10dB, bit rate is 10Gbps, center frequency is 50GHz, $\alpha=0.35$, $\sigma_n=517e^{-9}$ and the required SNR at the output of the receiver is 17 dB.

Fig. 2.20: Communication distance versus the gain of the LNA assuming antennas with 0dBi gain, a NF of 10dB for the LNA, and a bit rate of 10Gbps while considering 4dB for the link margin

Fig. 2.20: Receiver noise figure versus the gain of the LNA for an NF of 10dB for the LNA and a bit rate of 10Gbps considering 4dB for the link margin

Fig. 2.21: Calculation of the link budget for our transceiver

Fig. 3.1: Separation of different metal layers and their spacing from substrate for two different technology nodes (Figure not to scale)

Fig. 3.2: Gain-bandwidth product (GBW) of CMOS mm-waves amplifiers versus power dissipation

Fig. 3.3: Simple (a) and bridged (b) shunt peaking

Fig. 3.4: Small-signal model of a single stage amplifier with the loading effect of the next amplifying stage

Fig. 3.5: Third order ladder network at the output of an amplifying stage

Fig. 3.6: Ideal bandwidth improvement with series peaking versus $k = C_1/C$

Fig. 3.7: A common-source amplifier with bridged-shunt-series peaking

Fig. 3.8: Triple-resonance amplifier (TRA) (a) and its simplified model (b)

Fig. 3.9: Behavior of a triple-resonance circuit at different frequencies

Fig. 3.10: Frequency response of TRA

Fig. 3.11: Cascaded distributed amplifier (CDA) (a) and its simplified model (b)

Fig. 3.12: Asymmetric T-Coil approach employed in [40] for bandwidth enhancement with its equivalent small signal model

Fig. 3.13: Cascade of two gain stages in a typical amplifier (a) and the small signal equivalent model (b)

Fig. 3.14: The inter-stage matching network employed in

Fig. 3.15: Schematic of the three-stage cascode LNA proposed in

Fig. 3.16: Schematic of the amplifier proposed in

Fig. 3.17: The low noise amplifier proposed in

Fig. 3.18: Penetration of magnetic field to the substrate in a CPW transmission line

Fig. 3.19: Microstrip (a) vs. Coplanar waveguide (b) transmission lines

Fig. 3.20: Signal, ground and dummy metal placement in the employed CPW line

Fig. 3.21: Simple unite model for the transmission line obtained based on the simple RLGC model

Fig. 3.22: Unit cell model of the CPW transmission line employed in this design
Fig. 3.23: Comparison between measurement results (solid line) and model (dashed line) of the CPW transmission line .......................................................... 71

Fig. 3.24: Simple current re-used stage (a), Separation of load capacitance of each stage from the source capacitance of the next stage (b), increasing the order of inter-stage networks to extend the bandwidth (c) ........................................................ 73

Fig. 3.25: Small signal model of the matching networks connected to drain of odd (a) and even (b) stages of our current re-used amplifier .......................................................... 75

Fig. 3.26: Cascade of two common source stages: Current re-use (a) and conventional (b) architectures ................................................................................. 74

Fig. 3.27: Voltage gain from the input to the output node of figure 3.26 after modifying the architecture in figure 3.26 (b) to resemble the current re-used architecture of figure 3.26 (a) .......................................................... 75

Fig. 3.28: Comparison of two cascaded stages with current re-used and simple CS structure ................................................................................. 75

Fig. 3.29: Matlab simulation of the transimpedance transfer function of the circuits shown in figure 3.25 (b) vs. frequency under balanced condition .................................................. 77

Fig. 3.30: Cascaded common source stages in a current re-use structure (a) and small signal equivalent circuit used for current amplification analysis (b) ........................................ 78

Fig. 3.31: frequency response of the two port load achieving AG-BWER of 4.84 .......... 79

Fig. 3.32: Devised solution to obtain higher gain-bandwidth for two cascaded stages .... 80

Fig. 3.33: Schematic of the standalone LNA .................................................................. 81

Fig. 3.34: Layout of the standalone LNA ................................................................. 82

Fig. 3.35: Post layout simulation result of the standalone LNA .................................... 83

Fig. 3.36: Input impedance of the LNA ......................................................................... 84

Fig. 3.38: Simplified view of one common source stage with its load ......................... 85

Fig. 3.39: Effect of the $C_2$ on the gain performance of one common source stage .... 86

Fig. 3.40: Post layout simulation results for Stern stability factor: $K$ (a) and $\Delta$ (b) .......... 87

Fig. 4.1: Transmitter building blocks ........................................................................... 89

Fig. 4.2: Receiver building blocks .............................................................................. 90

Fig. 4.3: Simplified view of traditional NMOS-PMOS cross coupled VCO ............... 92

Fig. 4.4: The Designed VCO .................................................................................... 93

Fig. 4.5: Implementation of the capacitive bank ............................................................ 94

Fig. 4.6: Replacing the biasing switches with resistors ................................................. 95

Fig. 4.7: Layout of the 50GHz VCO ............................................................................ 96

Fig. 4.8: Model used for simulating the tank ................................................................. 97

Fig. 4.9: Tuning Range of the VCO versus the control voltage .................................... 98

Fig. 4.10: Phase noise vs. the frequency offset from the carrier at the center frequency of the VCO ............................................................................................. 98
Fig. 4.11: PA Schematic

Fig. 4.12: Gain, Pout & PAE of the PA vs. input power

Fig. 4.13: Schematic of PA Driver

Fig. 4.14: Schematic of the LNA integrated in the RX

Fig. 4.15: Layout of the RX LNA

Fig. 4.16: Post layout simulation results of the LNA integrated in the RX chain

Fig. 4.17: Block diagram of the baseband section

Fig. 4.18: Simplified schematic of a common source envelope detector

Fig. 4.19: Two different possibilities to realize a cascode envelope detector

Fig. 4.20: Schematic of the main and the dummy Envelope Detector

Fig. 4.21: Variable gain of the ED obtained by changing the bias voltage of the PMOS active load for an 100mV input signal

Fig. 4.22: Schematic of a single baseband amplifying stage

Fig. 4.23: Schematic of the first (a) and second (b) buffer amplifier stages

Fig. 4.24: Schematic of the feedback amplifier

Fig. 4.25: Block diagram of the baseband section to investigate the role of the feedback amplifier on offset cancelation

Fig. 4.26: Layout of the baseband section

Fig. 5.1: Simulated vs. measured I-V characteristic of a diode connected MOS in SS corner

Fig. 5.2: Chip micrograph of the low noise amplifier

Fig. 5.3: Setup for measuring the S-parameters of the LNA

Fig. 5.4: Measured S-parameters of the LNA on two different chips

Fig. 5.5: Noise figure measurement using a noise figure analyzer

Fig. 5.6: Setup for measuring noise figure of the LNA

Fig. 5.7: Measured noise figure of the LNA on two different chips

Fig. 5.8: Simulated performance of the LNA versus measurement results

Fig. 5.9: Comparison of the GBW of the fabricated amplifier vs. state of the art

Fig. 5.10: Chip micrographs of the transmitter

Fig. 5.11: Chip micrographs of the receiver

Fig. 5.12: Close-up picture of the connection between transmitter and antenna through wire-bonding

Fig. 5.13: TX/RX antenna architecture

Fig. 5.14: measurement set-up for the mm-wave OOK transceiver (a) Close-up of the transceiver front-end (b)

Fig. 5.15: Eye diagram at the output of the receiver for a 1Gbps pattern at 3cm

XII
Fig. 5. 15: BER performance of the TRX vs. distance .......................................................... 128
LIST OF TABLES:

2.1 Performance summary of state of the art mm-wave wireless link .................. 28
2.2 Individual blocks design requirements .................................................. 40
3.1 Performance comparison between the cascode topology and the current re-use architecture ................................................................. 45
3.2 Simulated performance of the stand-alone LNA ...................................... 83
4.1 Design parameters value for the VCO .................................................... 95
4.2 Comparison of the design VCO with some of state of the art ................. 98
4.3 Design parameters value for the PA and its driver ............................... 102
4.4 Simulated performance of the LNA employed in the RX .................... 105
4.5 Performance comparison of the two architectures shown in Fig. 4. 19 ....... 109
4.6 Simulated performance of the blocks in the baseband section .............. 116
5.1 Performance comparison of the fabricated LNA versus state of the art ...... 124
5.2 Power consumption of different blocks of transceiver .......................... 126
5.3 Performance comparison with state of the art ...................................... 128
Chapter 1

Introduction

The field of Microelectronics has grown rapidly over the past two decades, reaching far into our lives and livelihood. Wireless communications in particular has been protagonist of an exciting evolution being today pervasive in several aspects of our daily lives. Wherever we look, from cell phones to GPS guides, weather monitors, digital cameras, etc. we notice the presence of miniaturized integrated circuits (Fig. 1.1).

In order to allow mass production and large volumes, products have to be low cost, light and compact. Research and industry are aimed at satisfying market needs looking for wireless single chip realization of even complicated systems. This has kindled tremendous advances in RF Microelectronics.

Fig. 1.1: Penetration of microelectronics to every aspect of our life

Owing to the increased memory capacity and processing power in mobile devices there are now stringent demands for ever increasing data rate communication in order to enable fast synchronization for mobile devices as well as media sharing and access. These needs
together with the spectrum crowding in lower frequency bands have led to migration to higher frequencies (mm-wave and even beyond). Larger available bandwidth at these frequencies and higher achievable data rates -without the need for complicated modulation techniques- are other benefits of this migration.

Research on CMOS circuits in the mm-wave regime, particularly the 60GHz band, is an active topic [1-4]. The availability of an unlicensed 7GHz bandwidth has made the 60GHz frequency range potentially advantageous over other frequency bands. Standardization activity (IEEE 802.15.3c) have been done [5] to enable many new applications of this technology including short range high data rate communication, automotive radar, point-to-point wireless links and wireless HD video transmission. For example, 77GHz has been explored for automotive radar [6] and is expected to become more prevalent in coming years; 90GHz has been investigated for imaging and remote sensing applications [7].

Since most of the foreseen applications are battery operated and may preferably be handheld, the complete mm-wave system should consume reasonable power, should have small footprint and obviously a low cost including testing and packaging price. As a consequence a single chip or a single package solution is preferred.

The ever increasing speed of transistors in mainstream silicon-based technologies has helped realization of these goals. Consequently, solutions which were previously implemented in advanced compound (III-V) technologies and were limited to high end users are now entering the market of low-cost consumer electronic products [8].

In this chapter motivations and applications for mm-wave design will be explored and challenges encountered during circuit design for such high frequencies are investigated.

**CMOS for Millimeter-wave design**

Traditionally, mm-wave design has been limited to III-V compound technologies due to their higher speed and consequently higher gain compared to Silicon technologies. Though having the best performance at mm-wave regime, due to their high cost of implementation, III-V devices do not lend themselves to consumer electronic markets. The higher cost is a result of the demand for specialized substrates and low process yields of these technologies due to their inability to integrate digital circuitry on the same die.
Thanks to the constant increase in speed of Complementary Metal Oxide Semiconductor (CMOS) processes with technology scaling, mm-wave design has now become possible in CMOS technology. In particular CMOS enables complete integration of mm-wave together with low frequency mixed-signal circuits, the digital signal processing chain and even the antenna, all on the same die, eliminating the need for complex and expensive packaging of multiple dies. Furthermore, since all the signals are processed on the same die no high frequency, high dynamic range IO is necessary.

Passive devices (transmission lines, inductors, capacitors) have also benefited from this trend; it is easier to integrate them at these higher frequencies because of the smaller footprint.

Moreover, high speed CMOS digital signal processing allows possibility of built in self test (BIST) on the same die. BIST allows transceivers to self-test and self-calibrate, helping to quickly screen out faulty parts or debug problems to increase yield [9].

**Envisioned Applications**

Users ranging from enterprise level data centers to single consumers with smart phones requiring larger bandwidth, stipulate a wide range of applications for mm-wave technologies. Developments in technology and the regulatory environment have further expanded millimeter wave applications and opened new, potentially large markets. The large available bandwidth allows Gb/s communication and encompasses radiation with different range of capabilities. Consequently, the applications enabled by millimeter waves are quite diverse, ranging from security imaging to short range data transfer, from consumer satellite communications that bring broadband internet access to businesses and rural consumers, to automotive radar and many others. In this section some of the pictured applications for mm-wave frequencies will be studied.

One target application for mm-wave transceivers is wireless Gbit/second connectivity for personal area network (WPAN). A wireless peer-to-peer communication can be established between smart phones, laptops and home cinema for sharing and streaming of HD videos, photos and music. Mass storage devices can be part of the network for backup purposes, and a hotspot will provide a connection to the internet. Sending uncompressed video, required to avoid latency, obliges speed of more than 2Gbit/sec. The millimeter wave spectrum promises to fulfill this and even higher data rates.
Within this category, a lot of services oriented to a fast spread of information in public places can be implemented. For example data kiosks inside museums can rapidly push a multimedia guide into the visitor’s tablet, or urgent flight information can be directly sent to travelers’ smart phones in the airports.

Another envisioned application for mm-wave systems is automotive radars. Currently, automobile accidents are estimated to account for over 1.2 million fatalities and 50 million injuries annually. There is now political pressure to reduce this massacre and the European Union and several countries are introducing targets to reduce road deaths by 50%.

Ninety percent of fatal accidents involve driver errors. Therefore having some kind of system that recognizes situations where drivers are not responding accordingly and takes actions that result in prevention or mitigation of the accident would be much beneficial. A key part of such a system is automotive radar to detect objects surrounding a vehicle. Adding sight capabilities to cars enables the generation of new kinds of driving aids, like pre-braking for collision avoidance, steering correction for lane following and engine power modulation for adaptive cruise control (Fig. 1.2).

Automotive radar solutions have been around for more than a decade now. Due to stringent requirements, all automotive radars were implemented exploiting the advanced, expensive III-V MMIC modules. Using MMIC modules in discrete designs sets the price level of the complete solution to a high value which is out of reach for being installed on vehicles other than the high-end cars.

![Fig. 1.2: Employing mm-wave radars for driving assistance](image)

In order for these systems to be universally deployed across the entire price range of automobiles a low cost solution is required. To facilitate such a deployment, regulatory bodies including FCC have allocated spectra at 24-29GHz, 46.7-46.9GHz and 76-78GHz. At these frequencies attenuation of the spectra is minimum. Currently, the 24GHz band is
allocated for pulse based UWB communication for short range radar (SRR) applications which help with parking, blind spot detection, lane detection, stop and go in traffic and collision avoidance. The 77GHz band, on the other hand, is dedicated to frequency modulated continuous wave (FMCW) long range radar (LRR) solutions [10]. FMCW radar aids in low visibility (e.g. fog) condition, objects detection and also helps with adaptive cruise control (ACC) that is primarily intended for driving on highways. ACC allows target detection from a distance of few meters up to 150m.

Another great potential of mm-wave technology is in imaging. Basic principles of microwave imaging have been understood for decades. Extending these imaging techniques to higher carrier frequencies where larger bandwidths are available enables higher image resolutions both in depth and lateral spacing. Two main areas that can benefit from this enhanced imaging technology are medical diagnosis and security surveillance (Fig. 1.3).

There are two types of imaging systems: passive and active. An array of very low noise receivers is employed in passive mm-wave imaging to reconstruct a high resolution image of an object. All objects at a temperature greater than absolute zero emit blackbody radiation. This energy emission occurs at a broad range of spectrum with emission peaks at infrared frequencies. The amount of radiation at mm-wave frequencies is $10^8$ times smaller than that emitted at the infrared range. Nevertheless, thanks to their superior noise
performance, state of the art mm-wave receivers have at least $10^5$ times better sensitivity with respect to infrared detectors and the temperature contrast can recover the remaining factor of 1000 [11]. This makes mm-wave passive imaging solutions comparable to infrared imaging systems. Moreover, mm-waves are much more effective (i.e. lower attenuation) than infrared in poor weather conditions like clouds, fog, snow, rain, dust and even through clothes.

In active imagers, on the other hand, an ultra wide bandwidth pulse based system reconstructs the image of an object based on the scattered pattern of the radiated short-duration pulses. Strength of each scattered component determines the intensity of the corresponding pixel in the image. This is an interesting property because reflection characteristics of various tissues and substances are different. Furthermore, the resolution of an image is proportional to the wave length. A short wavelength thus allows detection of small objects even with small viewers. These features together with the much lower harmfulness of the millimeter-waves on the human body than the X-rays make such imaging systems an optimal choice for medical non-invasive screening applications like breast and skin tumor early detection, and for security applications like body scanners.

For the military, low visibility can become an asset rather than a liability; from the commercial point of view, fog-bound airports could be eliminated as a cause for flight delays or diversions; for predictive concerns, in-situ detection of concealed weapons could be accomplished in a fast and non-instructive manner.

For optimal detection the imaging windows are chosen where the atmospheric loss is minimum, namely at 35GHz, 94GHz, 140GHz, 220GHz. Since silicon technologies are capable of integrating large arrays of transceivers, the mm-wave imaging technology could be a low cost competitor to existing technologies such as MRI, CAT scan and infrared imaging systems.

Another application of mm-wave frequencies is in chemical sensors (Spectrometers). It is potentially possible to realize chemical sensors, especially for gases, working in the lower portion of the THz spectrum (i.e. between 200 and 300GHz). As illustrated in Fig. 1.4 in order to identify contents of a sample, the sensor, like a radio, transmits through a chamber and measures the received signal strength. It repeats the same action using different frequencies to determine the frequency response of each substance [7].
Wireless Local area network (LAN) or Metropolitan area network (MAN), for communications in offices and between buildings on the same campus or of the same company in a restricted area, is another envisaged application for mm-wave transceivers (Fig. 1.5).

Conventional interconnect systems are projected to be limited in their ability to meet the future interconnect needs. Mm-wave technology can be used to realize high data rate, low power, wireless chip to chip communication (Fig. 1.6). Internal data transfer from a video processing chip to a display in a high-definition television is an example of such high speed communication.

A large, complex system consisting of several chips may require up to several Gb/s interconnecting data rate. This increasing data rate results in severe signal attenuation and distortion in electrical channels because of the skin effect, dielectric absorption, and impedance mismatches. Conventional solutions are approaching the maximum bandwidth...
limit of electrical interconnects. Furthermore, wired connections require additional on-chip drivers to maintain signal integrity which results in additional dynamic power dissipation and adds to the component and implementation costs of the system [12]. As a consequence, there have been efforts to replace the electrical channels with optical connections. However, optical connections require optical-electrical (O/E) and electrical-optical (E/O) conversion devices, which usually need non-Si devices that can generate and modulate optical signals, and optical waveguides for non-line-of-sight transmission. These extra devices increase system complexity and add to the overall cost.

![Fig. 1.6: Wireless vs. wired chip to chip Communication](image)

On the other hand, the number and bandwidth of the internal I/Os in today’s highly sophisticated electronic systems have grown at a faster pace than their external counterparts. The necessity of multiple high speed short range internal I/Os result in several design issues. In fact, for some cases, instead of actual core circuits, the number of I/Os determines the system on a chip (SoC) size. The wires and connectors limit mechanical design flexibilities and can affect the system performance and reliability. Such a wired solution also leads to larger area and higher consumed power, as it necessitates additional on-chip drivers to maintain signal integrity.

To address these issues, the inter-chip and intra-chip wireless interconnects concept are being investigated [12-14]. Such an intra-connect solution helps us removing the wired interconnects and internal I/Os; hence providing higher reliability, design flexibility and the opportunity to reduce the required chip size for the transceiver system. More importantly, its unique broadcasting nature can be exploited in systems that require one-to-many transmissions. However, for the practical implementation, there are several
challenges to overcome. The wireless signal must be confined in a box and shielded from outside to avoid any undesirable electromagnetic compatibility problem. The wireless system must also support the required high data rate for internal I/O connections [12]. Furthermore, the system must be implemented with a small footprint, should consume small power, and ideally should be implemented in standard CMOS for integration into SoC’s.

**Design Challenges**

Despite the tremendous advantages in terms of integration and cost that CMOS technology offers, it is inferior in terms of noise, power handling, substrate and coupling loss and transistor cut-off frequency to the more expensive Silicon Germanium (SiGe) and III-V compounds such as Indium Phosphide (InP) and Gallium Arsenide (GaAs) traditionally used to implement millimeter-wave devices. CMOS has greater process variability, lower carrier mobility and lower device breakdown voltages that pose challenges to the RF designer.

In addition, in scaled nodes, supply voltages tend to reduce continuously. For Digital signal processing, lowering the supply voltage helps in significantly reducing power consumption, but noise, linearity, and output power of analog circuits require higher supply voltages to improve. Therefore, analog design gets harder with reduced supply voltages.

Furthermore, CMOS technology Design Kits (DKs), usually do not support mm-Wave design. Beyond 10-20GHz device models are not very accurate and mm-wave components are not characterized and/or are missing (e.g. very small inductors, capacitors, Transmission lines, etc.).

Available models that circuit designers use in their daily simulations are the so-called “compact” models. Compact models are the interface between the technology and the design. Such models significantly reduce the time required to evaluate the performance of a circuit. Based on a combination of physical and empirical methods many reliable compact models have been generated for digital, analog, and RF applications which provide general equations to describe device’s behavior. Several parameters are embedded in each equation in order to seize details of a given technology. Determination of these parameters requires complicated curve-fitting procedures (parameter extraction). Most
compact models have the advantage of describing the behavior of the device in all regions of operation at the same time. Furthermore, they provide small and large signal analysis as well as noise analysis and are valid over a fair range of geometry, width and length of the devices [15].

The core equations in most compact models have been derived under quasi-static assumptions. This, together with the fact that most of the available extracted parameters are for low-frequency applications, makes these compact models less desirable, often inaccurate for mm-wave applications. Two main reasons can be devised for this inaccuracy. First, since the parameter extraction has been done in lower frequencies, extrapolation to mm-wave frequencies will not be valid. Some device mechanisms that are not evident at low frequencies, and hence not modeled properly, have noticeable effect on the performance of the device in higher frequencies, resulting in some inaccuracies [15]. Substrate parasitics are examples of such effects. The second source of error in modeling is due to layout effect. At mm-wave frequencies, the device layout has a significant impact on performance. As a result, careful floor plan and device design becomes quite important in pushing the capability of CMOS to higher frequencies. The layout dependency of the device performance and the closeness to the activity boundary at these frequencies makes the modeling task even more crucial. The size of each block and the distance between them is also important. Large spacing between the devices will lead to larger interconnects. These interconnects introduce small inductors, resistors, and capacitors to the model, thus creating mismatch and differences between simulated and fabricated chip. Although negligible at lower frequencies, these components, change and, in fact, dominate the performance of the device as the frequency increases and, therefore, should be included in the model. Accurate prediction of these parasitic requires detailed full-wave electromagnetic simulations, which are laborious and time consuming. As shown in Fig. 1.7 the procedure requires an additional effort of device customization, optimization, measurement and modeling prior to a successful mm-wave design.

![Fig. 1.7: Device customization, optimization, measurement and modeling are key steps for mm-wave design](image)
Transmission lines and Inductors are also not supported in the design kit, requiring an application specific modeling procedure for every new transmission line or inductor. Such a strategy results in a ‘one-per-device’ modeling. It can yield accurate models, but is limited in design flexibility. Scalable mm-wave models are desired, so that only one model for each type of device is required [3].

On the other hand, pad capacitances and bonding inductances do not scale with the minimum gate length putting a larger burden in creating matching network for circuits designed in these newer technologies.

**Thesis outline**

In the next chapters, mm-wave high data rate transceivers will be studied. Furthermore, system level calculation for designing a 10Gb/s short range link operating at 50GHz will be demonstrated and the building blocks of such a link, both at the transmitter and receiver front-ends will be presented in 28nm bulk CMOS technology node. The design methodology is robust and features excellent agreement between measured and simulated performance. We first review some of the state of the art mm-wave transceivers and carry on system level calculations in chapter 2. In chapter 3 design of a wideband low noise amplifier will be discussed as the main focus of this thesis. Consequently, design of transceiver building blocks will be discussed in chapter 4. We will then go through the measurement results in chapter 5, by analyzing the operation of the low noise amplifier, the VCO and the complete transceiver system. A conclusion in chapter 6 will finally close this thesis.
Chapter 2

Mm-wave Transceiver Design

Introduction

In the previous chapter, momentums for research on mm-wave frequencies and the urge for higher data rate radios were focalized and discussed. It seems evident that the demand will continue to grow in the foreseeable future, especially at the V-band (40–75GHz) and W-band (75–111GHz). However, battery life has always been a major concern for mobile device design. Power consumption must be kept low despite the increased data rate.

Consequently, transistors dimensions are constantly decreasing. While the minimum feature size for CMOS transistors was in the range of 250nm and even beyond at the early 90’s, the minimum channel length has reached 22nm in 2013 and is expected to go below 10nm by 2020 according to the International Technology Roadmap for Semiconductor (ITRS) data [8]. Figure 2.1 illustrates the shrinkage of the minimum channel length and the increase for $f_T$ and $f_{max}$ for CMOS transistors over the past two decades.

![Fig. 2.1: Evolution of minimum channel length and maximum operation frequency of CMOS transistors over the years based on ITRS data [10]](image-url)
Thanks to the technology scaling, modern CMOS transistors have reached a transit frequency ($f_T$) and maximum operating frequency ($f_{max}$) above 400GHz and 450GHz, respectively; making CMOS a perfect low cost solution for realization of mm-wave radios.

Furthermore, for the same quality factor, the higher is the operating frequency of the system, the larger is the achievable bandwidth. Since, according to Shannon theorem [16], the channel capacity ($C$) is directly proportional to the available bandwidth ($B$), one can conclude by increasing the operating frequency, the achievable data rate will also increase.

\[ C = B \log_2(1 + SNR) \]  

(2-1)

Figure 2.2 exemplifies data rate versus bandwidth for different standards. It also illustrates the evolution of the communication distance versus the carrier frequency.

It is interesting to observe the variation of available data rate over the years. While the data rate was limited to tens of Mb/s at the beginning years of this century, ITRS predicts a data rate of more than 50Gb/s for 2015 (Fig. 2.3). As a consequence, advanced wireless
technologies should always adopt a timeline projection to increase data rates by 5 to 10 times every three or four years to keep pace with the new demands [17].

The need for low power, low cost transmission of Gb/s data dictates several challenges to the field of mm-wave circuit design. In the following sections, after defining the IEEE 802.15.3 standard, the prerequisites for designing such high data rate wireless systems will be introduced. Some design examples from state of the art Gb/s radio links will be studied and the proposed solution for implementing such a transceiver will be presented. The link budget will also be calculated.

IEEE 802.15.3 Standard

The IEEE Standard for Information technology Telecommunications and information exchange between systems, local and metropolitan area networks is “802.15.3c” [5]. The first version of this standard (IEEE 802.15.3-2003) is a MAC (medium access control) and physical layer (PHY) standard for 11 to 55Mbit/s WPANs. As an attempt to provide UWB PHY with higher speed for applications which involve imaging and multimedia, the IEEE 802.15.3a amendment was proposed. In 2006, IEEE 802.15.3b which enhances 802.15.3 to improve implementation and interoperability of the MAC was released. It included minor optimizations while preserving backward compatibility.
IEEE 802.15.3c was published on September 11, 2009. The task group TG3c developed an alternative PHY operating in the millimeter wave for the existing 802.15.3 Wireless Personal Area Network (WPAN) Standard (802.15.3-2003). This mm-wave WPAN operates in the 57-64GHz unlicensed band and allows coexistence (close physical spacing) with all other microwave systems in the 802.15 family of WPANs. Furthermore, it allows very high data rate applications such as high speed internet access, streaming of high definition videos, wireless replacement of wired interconnects, etc. Other key features and additions of this amendment are as follows:

- Beam forming negotiation for the transmitter to increase the communication range.
- The ability to aggregate incoming data into single packets for improved MAC efficiency.
- Acknowledgment of individual sub packets in a packet to improve the MAC efficiency at the high data rates provided for by the PHY, and to reduce retransmission overhead.

The IEEE 802.15.3c standard defines three PHY modes [5]:

- Single carrier (SC) mode optimized for low power and low complexity.
- High-speed interface (HSI) mode optimized for low-latency bidirectional data transfer.
- Audio/video (AV) mode optimized for the delivery of uncompressed, high-definition video and audio.

In addition, to promote coexistence and interoperability, a common mode signaling (CMS) is defined based on a low data rate SC PHY mode, allowing devices with different PHY modes to communicate.

The SC PHY, itself, is divided into three sub-classes:

- Class 1 is specific for the low power low cost mobile market, achieving a relatively high data rate up to 1.5Gb/s, while employing simple modulation techniques such as OOK.
- Class 2 is intermediate for data rates up to 3Gb/s.
- Class 3 is dedicated to high-performance applications with data rates in excess of 5Gb/s.
All of these classes can use only one carrier, so two or more of the adjacent 2.16GHz channels in figure 2.4 can be merged together to increase the data rate.

![Channel plan of IEEE 802.15.3c Standard](image)

**Fig. 2.4:** Channel plan of IEEE 802.15.3c Standard

The second usage model involves an ad-hoc system to connect computers and devices around a conference table. In this usage model, all of the devices in the WPAN will have bidirectional, Non-Line Of Sight (NLOS) high speed, low-latency communication, which is provided by the HSI PHY. HSI PHY uses orthogonal frequency domain multiplexing and several coding schemes to maximize the spectral efficiency.

AV PHY is also designed for NLOS operation and serves for the transport of uncompressed, high definition video and audio. It uses OFDM modulation. The AV mode supports Omni-directional coverage via the low-rate PHY (LRP) to set up connections and employs the high-rate PHY (HRP) for transmitting high-throughput data.

### Design Requirement

Two different design paradigms characterize the path to today’s mm-wave CMOS radios: one followed by analog and RF designers, who prefer arbitrary interface impedances and SPICE-like simulators, and the other followed by microwave designers, who usually prefer modular designs, impedance matching at each interface, and ADS-like simulators. While RF designer tends to use inductors, microwave designers would rather transmission lines and distributed structures [19]. Nevertheless, the two paradigms are converging, resulting in a robust, flexible design methodology.

One of the advantages of mm-wave radios over competing technologies, such as IEEE 802.11 and ultra wideband radios is the reduced system-level complexity and the amount of digital signal processing required to achieve equivalent data rates. Single carrier systems with simple modulation schemes can be utilized to achieve multi Gb/s data rates.
without the need for power hungry ADCs and DACs [2]. However, to take advantage of the potential lower complexity and power saving opportunities, appropriate transceiver architectures and circuit topologies must be selected.

Millimeter-wave CMOS radios present severe challenges at all levels of abstraction, necessitating the designers ascending and descending the device-circuit-architecture-system ladder with mastery. Several design parameters have to be taken into account to guarantee successful implementation of an mm-wave radio link. Although most of these issues, such as low power, low cost and compact implementation, are common with RF design, different approaches should be selected to address these requirements. For example, in modern wireless communication, orthogonal frequency division multiplexing (OFDM) scheme is popular due to its robust and anti-fading ability. Although used at mm-waves too, the tight performances of power amplifier and analog-to-digital converter restricts the low cost and low power millimeter wave transceiver implementation with this modulation scheme [20]. Consequently, simpler modulation schemes should be adopted to have compact, power-efficient deployment of the link.

Furthermore, active and passive devices and interconnects between them entail issues that become more serious as the frequency of operation enters mm-wave regime. Many of these issues arise from the limited speed of transistors and the limited supply voltage, both of which encourage employment of inductors or transmission lines as loads. In other words, nodes running faster than a certain frequency (e.g., roughly 15GHz in 90nm technology) must employ resonance [19]. Unfortunately, the large footprint of inductors and T-Lines leads to large dimensions for the building blocks and hence long high-frequency interconnects.

Comparison of the present speed and interconnect issues at mm-wave to those of 5GHz design in the late 1990s will be interesting. The transit frequency of NMOS transistors reaches 110GHz in the 90nm generation, about five times of that in 0.25μm devices used in early 5GHz designs. Moreover, the outer 50-100μm diameter of inductors for 60GHz operation is only about a factor of two smaller than that of spirals used at 5GHz (100-200μm). In other words, the frequency of operation has scaled by a factor of 12 but the transistor speed by roughly a factor of five and the interconnect lengths by a factor of 0.5 only, making the design and floor planning of the receiver a much more challenging task. Another divergence relates to the quality factor of inductors and varactors. Well-designed symmetric spiral inductors exhibit a Q of about 10 at 5GHz, but a Q of no more than 30 at
60GHz. This saturation of Q, which is attributed to the small spacing between the top-most metal layer and the substrate -that causes large parasitic capacitances- and to the substrate loss, makes the design of millimeter-wave oscillators quite difficult. Trade-offs between phase noise, tuning range, and power dissipation become much more severe, because the Q does not scale by a factor of 12 from 5GHz to 60GHz. In addition, the quality factor of varactors falls below that of inductors at millimeter-wave frequencies [19].

Device modeling is another issue for mm-wave design [21]. As stated in chapter one, the common approach on transistor modeling at mm-wave is based on the measurement of fabricated devices, yielding models expressed as a black box (e.g. with S-parameters) or as a fitted physical representation with additional parasitics. With this type of model it is exceedingly difficult to depart from the specific geometry of the fabricated devices, thereby considerably constraining the design and layout of circuits. Furthermore, since various folding and routing techniques are required to create a compact layout for a given device dimension, the created model is not scalable. Also, measurement of MOS devices, especially those with a small width, becomes difficult at these frequencies due to errors introduced by inaccurate de-embedding from calibration structures and coupling between probes [19].

The situation gets harder if an mm-wave system is taken into account. Different circuit blocks, for each of which the above issues are valid, should be put together subtly to make the whole system work appropriately. Altogether, to create a communication link for different types of applications at mm-wave, different design strategy, different link budgets and overall system requirements are needed to be met which demands for a case by case study of respective applications. For the sake of this purpose, in the following section we will briefly review some of the state of the art designs for 60GHz transceivers.

#### State of the art mm-wave data links

Different design strategies and modulation schemes are employed to realize high data rate mm-wave radio links. In this section some design examples of such links will be briefly studied.

Several mm-wave transceivers are designed to satisfy IEEE802.15.3c standard requirements [22-25]. As an example, [25] demonstrates a 60GHz direct-conversion
transceiver using quadrature oscillators to realize IEEE802.15.3c full-rate wireless communication for every 16QAM/8PSK/QPSK/BPSK mode (Fig. 2.5).

![Diagram of Direct-Conversion Transceiver for IEEE802.15.3c](image)

**Fig. 2.5:** Direct-Conversion Transceiver for IEEE802.15.3c [25]

Direct conversion architecture is employed for the transceiver for the sake of energy efficiency. The transmitter consists of a 4-stage PA, I/Q mixers and a quadrature oscillator. Low loss transmission lines are employed to implement the PA. A double-balanced Gilbert mixer is used but only one side is connected to the output in consideration of power consumption, LO leakage, and chip area.

The receiver consists of a 4-stage LNA, I/Q passive mixers, and a quadrature oscillator. The LNA adopts a CS-CS topology to improve the noise figure, and is connected to the passive mixer via a parallel-line transformer and a 2-stage differential amplifier. A transformer balun generally causes an imbalance in differential signals; hence, differential amplifiers are used to compensate the imbalance with common-mode rejection in the matching blocks.

The LO consists of a quadrature injection-locked oscillator (QILO) and a 20GHz PLL. The QILO works as a frequency tripler with a 20GHz injection-lock input, and it has a tail I/Q coupling. To avoid insertion loss in the 60GHz LO distribution, which also contributes to maintain I/Q phase balance, two separate quadrature oscillators are used for the transmitter and receiver.
The transceiver is fabricated in 65nm CMOS technology with a core area of 7.3mm$^2$, including all the matching blocks. Figure 2.6 shows the measured spectrum in QPSK mode with the IEEE802.15.3c spectrum mask. The input I/Q signal is generated by an arbitrary waveform generator (AWG) with a symbol rate of 1.76GS/s and a roll-off factor of 25%. A horn antenna receives the TX output signal. The received signal is then measured by a spectrum analyzer with a down conversion mixer. Full-rate communication speed is possible for channel 1 (57.24 to 59.40GHz) and channel 2 (59.40 to 61.56GHz) of IEEE802.15.3c within a BER of $<10^{-3}$. The maximum data rates with an antenna built in the package are 8Gb/s in QPSK mode and 11Gb/s in 16QAM mode for a BER of less than $10^{-3}$. The transceiver consumes 292mW of power.

Since the bandwidth of the IEEE802.15.3c standard is limited to the 7GHz unlicensed band around 60GHz, the achievable data rate for a fixed modulation scheme for systems based on this standard will be limited. The situation will be more intense if simple modulation schemes (such as OOK) are employed to realize the transceiver. In order to respond to the demands for high data rate, low complexity, low power radios, there are many other mm-wave transceivers which do not exactly follow the specifications determined by the IEEE802.15.3c standard [1, 2, 4, 13, 26].

Reference [26] introduces a low power 60GHz transceiver that includes RF, LO, PLL and baseband signal paths integrated into a single chip. In order to maximize energy efficiency, the transceiver, shown in figure 2.7, utilizes mixed-signal techniques similar to those used in high-speed electrical links. By utilizing the available bandwidth at 60 GHz as a single channel, the transceiver allows 4Gb/s communication using QPSK modulation.
Simultaneous achievement of high output power and efficiency in transmitters is one of the challenges at mm-wave frequencies. This is further pronounced by the need for power combining of I and Q signals in the quadrature transmit chain, which can be a significant source of loss. Both of these challenges are addressed in this transmitter design.

In the transmitter, the digital data from the on-chip PRBS generator is fed to the modulator which consists of a fully differential combined DAC-mixer structure. The modulator uses a double balanced Gilbert quad whose tail current sources are digitally switched by the input data. The LO signal derived from the VCO is first converted to differential mode using a low loss transformer and then fed into the LO port of the Gilbert cell.

This transceiver employs QPSK modulation to achieve high data rates. Hence the outputs of both an in-phase (I) and a quadrature (Q) up-conversion mixer must be combined into a single RF output. This is done through current-mode summation and is achieved by directly connecting the outputs of the two mixers. The combined output from the quadrature mixer is then fed to a transformer coupled, pseudo-differential power amplifier. By employing two vertically-coupled loop inductors this transformer performs both impedance matching to the external 50Ohm load as well as differential to single-ended conversion.

The receiver design consists of an ESD protected, two stage cascode LNA, quadrature hybrid I/Q down conversion mixers, and a four-stage variable gain amplifier (VGA) which
further amplifies the received signal. The receiver shares the VCO and PLL with the transmitter.

As the transceiver is targeted for high data rates, the baseband must operate at GHz frequencies. At the same time it should provide functions such as phase rotation as well as equalization. Phase rotation is required to account for phase differences between transmit and receive LO and equalization is required to counteract ISI.

Instead of complex modulation schemes and traditional ADC/DSP based implementation approaches which require high baseband power consumption, a simple modulation scheme is chosen by following the approach taken by high speed chip to chip links. Specifically, the baseband uses only as many comparators as required to extract the data. Mixed-signal processing is then preformed around these comparators to counteract non-idealities in the transceiver and the channel. A digitally programmable analog phase rotator is also employed in the mixed-signal baseband stage to compensate the phase difference between transmit and receive LO.

The transceiver is fabricated in a standard one-poly seven-metal (1P7M) 90nm CMOS process. The die occupies 6.88mm² and is bonded in a chip on board configuration. With a 1.2V supply the chip consumes 170mW while transmitting 10dBm, and 138mW while receiving. Using on-chip 2³¹-1 PRBS generator and checker, a BER of 10⁻¹¹ is measured for the transceiver sending 4Gb/s data with QPSK modulation over a 1m wireless channel with 25dBi horn antennas. For a wired channel the data rate increase to 6Gb/s and the distance to 3m for the same BER.

Another simple modulation scheme that is appropriate for rapid file transfer applications is presented in Ref [2]. Here, a comparison is first made between digital and analog modulation schemes and a digital one is then adopted to realize the transceiver. The authors reason that an analog modulation scheme has numerous analog blocks for each of which stringent linearity requirements have to be satisfied. For example, the design of a 60GHz power amplifier, which simultaneously delivers high output power, high gain, and high linearity to the antenna, represents one of the most serious challenges in mm-wave circuit design. The linearity requirements are typically addressed by operating the power amplifier backed-off several dB from the saturated output power level and much below its peak operating efficiency point.

A direct digital modulator/PA, on the other hand, overcomes these limitations by allowing the system to operate in saturated mode, with maximum efficiency, and with the output
signal swing constrained only by the reliability limit of the transistors. Such a choice also simplifies the baseband circuitry, which can entirely be implemented digitally (Fig. 2.8).

![Conventional up conversion transmitter (a), Direct digital modulation transmitter (b)](image)

**Fig. 2.8:** Conventional up conversion transmitter (a), Direct digital modulation transmitter (b)

As shown in figure 2.9 the transceiver in [2] integrates a fundamental frequency zero-IF receiver, with a direct modulation transmitter. 60GHz direct BPSK modulation is employed to prevent the need for a power amplifier, a fundamental static frequency divider and ADCs and to obviate the need for image rejection.

![60 GHz direct modulation BPSK transceiver architecture introduced in [2]](image)

**Fig. 2.9:** 60 GHz direct modulation BPSK transceiver architecture introduced in [2]

Baseband digital non-return-to-zero (NRZ) data, produced off-chip at rates beyond 6Gb/s, is applied to the large power BPSK modulator which directly modulates the 60GHz LO signal and drives 50Ohm loads differentially at the transmitter output. The receiver outputs the same digital data stream in NRZ format in a true, single chip bits-in/bits-out radio transceiver.

On the receiver side, a double-balanced Gilbert cell down convert mixer is driven by a high gain, low noise amplifier. Since quadrature modulation is not employed in this
design, the receive mixer is single phase. Although less spectrally efficient, the use of a single phase modulation scheme instead of a multi phase or quadrature technique, reduces the system complexity and power consumption required to distribute amplitude and phase matched quadrature signals throughout the chip.

Furthermore, no Image reject mixer is utilized in the transmitter because the transmitter generates a double sideband signal nullifying any benefit of such a mixer. No IF amplifier is included in this design, so all the receive-path gain is at 60GHz. The baseband NRZ data is recovered at the IF output of the receiver, without any digital signal processing or analog to digital conversion.

It is also worthwhile to mention that a relatively big link margin is achieved in this design thanks to the employment of high directivity antennas. The presence of the high directivity antennas relaxes the transceiver specification sufficiently that only moderate transmit power and good noise figure are necessary.

The chip in [2] occupies 1.28x0.81mm^2 and consumes 374mW from 1.2V which reduces to 232mW for a 1.0V supply. Its performance was validated in a 1-6Gb/s 2-meter wireless transmit-receive link over the 55-64GHz range.

In an attempt to reduce both the system level complexity and the power consumption, a 43GHz wireless inter-chip data link is introduced in [13]. The block diagram of this transceiver is illustrated in Fig. 2.10. Instead of the area consuming, low efficiency on-chip antenna, low cost bonding wires are employed in this design to realize the antennas. The high efficiency of the bond wire antennas reduces the undesirable power loss in them, thus helps in saving power.

Furthermore, the proposed transceiver adopts an ASK modulator and a non-coherent receiver for the ASK modulated signal. The compact architecture of such a solution allows a power efficient implementation.

![Fig. 2.10: Transceiver architecture in Ref [13]](image-url)
In the transmitter section an ASK modulator, which also serves as a differential to single-ended converter, modulates the input data generated from a Pseudo Random Bit Sequence (PRBS) generator. The modulated data is amplified by a two-stage PA and consequently transmitted through the impedance matched bond-wire antenna. Another bond-wire antenna receives the signal on the receiver side. The received signal passes through a non-coherent receiver which demodulates and recovers the envelope of the transmitted signal. To guarantee that the signal amplified by the LNA is larger than the threshold of the envelop detector (ED) extra gain is then provided by a pre-amplifier, consisting of a cascaded pair of a common emitter and an emitter follower. Demodulation and down conversion of the high speed signal is done by a bipolar transistor based ED. A VGA with a 35dB dynamic range follows the ED with the aid of which the gain of the receiver chain may vary from 50dB to 85dB. Finally, a comparator is employed to recover the logic level of the input signals. It also serves as a balun converting the single ended signal of the VGA to differential output.

The transceiver of [13] is fabricated in Jazz 180nm SiGe BiCMOS technology and occupies a total area of approximately 0.62mm². For a maximum data rate of 6Gb/s at a communication distance of 2cm, it consumes 117mW including the input and output 50 Ohm buffers. The bit energy efficiency, not including power of the buffers, is 17pJ/bit (17mW/Gb/s).

Another low power, low complexity solution for multi Gb/s wireless communication is presented in [1]. It entirely eliminates the digital interface and baseband circuitry by integrating purely analog modulator/demodulator into the RF frontend.

As shown in figure 2.11, the transmitter in [1] consists of a 60GHz VCO, an OOK modulator, and a pseudo-differential power amplifier made of two identical single ended amplifying stages. The modulator directly modulates the 60GHz clock before it is delivered to the PA.

![Fig. 2.11: Transceiver architecture presented in [1]](image-url)
The receiver, on the other hand, is composed of a pseudo-differential LNA, a double balanced mixer, an IF amplifier, an OOK demodulator, and the subsequent limiting amplifier. The received signal from the antenna is amplified by the LNA and is then down converted by the mixer to about 10GHz. Another on-chip VCO provides the required 50GHz LO for the receiver. This IF frequency is chosen as a compromise between the maximum data rate and the IF amplifier bandwidth. For a maximum data rate of 5Gb/s, such a selection ensures at least two cycles of IF signal in each bit after down conversion.

No inductor is used in the IF amplifier to realize the 10GHz bandwidth. Furthermore, since the demodulator in this architecture does not deal with 60GHz signal, the requirements on power and circuit complexity are considerably relaxed.

Two antenna structures, folded dipole and patch array, are also employed to fully examine the performance. Designed and fabricated in 90nm CMOS technology, the transmitter and the receiver consume 183 and 103mW and occupy 0.43 and 0.68mm², respectively. With 4x3 patch antenna array, which ensures the best operation of the transceiver, it achieves error free operation (BER <10⁻¹²) for 2³¹ -1 PRBS of 1Gb/s over a distance of 60cm.

Another considerable work, specifically in terms of complexity and power, is the Millimeter-Wave Intra-Connect Solution presented in [4], which benefits both a simple architecture and a very low power consumption. The block diagram of this transceiver system is illustrated in figure 2.12.

The transmitter consists of a VCO, a mixer, and a TX amplifier. An ASK modulation scheme has been adopted in this work to help simplifying the receiver design. Such a strategy allows simple amplitude detection and injection locked clock recovery.

A current commutating double-balanced Mixer is used as the ASK modulator to suppress the influence of mixer’s impedance variations due to the baseband input signal. Two cascode stages connected to an output common source structure, construct the TX
amplifier. Common source configuration is adopted for the output stage to achieve better linearity. The output of this amplifying stage is directly connected to the antenna.

A coherent transmission system is employed in this work for the better sensitivity that it offers compared to the non-coherent solutions. Such a system is usually realized using phase locked loops (PLLs) and phase rotators, leading to a power hungry, area inefficient design. In order to compensate this defect, a carrier component in the modulated signal is used to obviate the need for a phase locked local oscillator. In other words, a carrier component is transmitted together with the modulated signal for the injection locking at the receiver to realize coherent operation between the transmitter and the receiver.

The receiver section, consists of a three stage cascode LNA, an injection path, a VCO, a mixer, a DC offset cancellation circuit, and a wideband baseband amplifier. It operates in direct conversion mode. The absence of an IF stage in this architecture enables wideband operation and helps reducing the chip area. One of the key features of the approach adopted in this design is the injection path between the LNA output and the VCO. This injection path injects the received signal out of the LNA into the VCO, locking the VCO to the received carrier generated by the LO of the transmitter. Both LOs in the transmitter and receiver are free running and their frequencies are fluctuating. However, by locking the receiver LO to transmitter’s LO fluctuations track each other making coherent transmission between transmitter and receiver possible.

The output of the LNA is connected to a single-balanced mixer whose LO port is fed by the signal generated through injection. A feedback system realized by a pair of operational amplifiers compensates the DC offset due to the carrier component of the received signal at the output of the direct conversion mixer. It forces the output DC levels to track each other via negative feedback loops. The output of the Mixer is then fed to a two stage baseband amplifier which provides the input for the measuring instrument.

This transceiver is fabricated in 40nm low power logic CMOS process and achieves a data transmission rate of 11Gb/s at 56GHz over a transmission distance of 14mm. The whole transceiver consumes 70mW of power and its active footprint is 0.13mm².

To obtain an overview, performances of some of the state of the art mm-wave links with data rates in the range of 1 to 10Gb/s are summarized in table 2.1.
Table 2.1: Performance summary of state of the art mm-wave wireless link

<table>
<thead>
<tr>
<th>Ref</th>
<th>Technology</th>
<th>Carrier Freq. (GHz)</th>
<th>Modulation</th>
<th>DC Power (mw)</th>
<th>Max. data rate (Gb/s)</th>
<th>Max. distance (cm)</th>
<th>Area (mm²)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[1]</td>
<td>90nm CMOS</td>
<td>60</td>
<td>OOK</td>
<td>286</td>
<td>3.3</td>
<td>60°</td>
<td>1.11</td>
</tr>
<tr>
<td>[2]</td>
<td>65nm CMOS</td>
<td>60</td>
<td>BPSK</td>
<td>374</td>
<td>4.0</td>
<td>200°</td>
<td>1.00</td>
</tr>
<tr>
<td>[26]</td>
<td>90nm CMOS</td>
<td>60</td>
<td>QPSK</td>
<td>308</td>
<td>4.0</td>
<td>100</td>
<td>6.88</td>
</tr>
<tr>
<td>[13]</td>
<td>180nm SiGe</td>
<td>43</td>
<td>Binary ASK</td>
<td>117</td>
<td>6.0</td>
<td>2°</td>
<td>0.62</td>
</tr>
<tr>
<td>[4]</td>
<td>40nm CMOS</td>
<td>56</td>
<td>ASK</td>
<td>70</td>
<td>11.0</td>
<td>1.4</td>
<td>0.13</td>
</tr>
</tbody>
</table>

*: Note that the Max. distance for these cases is different from the distance at which the Max. data rate is achieved

The architecture employed in this work

Mm-wave transceivers have been already evolved in SiGe and silicon CMOS. Most of these transceivers employ quadrature architecture and are quite expensive to develop, requiring large chip area and significant dc power. This is primarily because of the interface (ADCs) and the subsequent baseband circuitries (DSPs). As a consequence, simpler modulation schemes were investigated to be employed to make the design of the transceiver system simplified.

In discussing potential modulation schemes, it is worthwhile starting briefly from the coherent receiver architecture. Coherent systems require carrier phase information at the receiver and need to estimate the phase and magnitude responses of the channel. As a consequence, they demand synchronization algorithms to synchronize the local oscillator of the receive path with that of the transmit path, hence imposing more complexity and costs. However, as illustrated in figure 2.13 the achievable bit error rate of detection for coherent receivers is superior to that of the non-coherent receivers for the same signal to noise ratio (SNR).

We have adopted a non-coherent solution to implement the receiver. Compared to coherent receivers, non-coherent receivers are more sensitive to noise and interferences. However, because of the very small size of typical mm-wave multi-chip systems, for some applications the effect of this issue can be reduced by properly shielding the system (i.e. putting the transceiver chip inside an isolating package). A non-coherent receiver does not need carrier phase information and uses methods like square law (push detection or energy
detection) to recover the data; hence, obviating the need for a complicated local oscillator signal generator and fine phase alignment circuits, which leads to a compact low-power implementation.

Fig. 2. 13: Comparison of achievable BER for coherent & non-coherent OOK detection vs. SNR

To realize easy transceiver implementation with non coherent modulation, on-off keying (OOK), which is the simplest amplitude-shift keying, is employed. OOK consists of keying a sinusoidal carrier signal on and off with a uni-polar binary signal. It obviates the power hungry, high speed analog to digital converters for short range gigabits data communication at 50GHz, thus allowing the system to be greatly simplified, the chip area to be minimized and the power consumption to be reduced. The tradeoffs are lower spectral efficiency and co-existence [3]. However, these issues are not as significant at mm-wave. As a consequence, the only penalty will be a higher BER for a fixed SNR.

The OOK power spectral density is a Sine square function with a main lobe width of $2/T_b$ around the carrier, in which $T_b$ denotes the bit period [1]. Similar to frequency shift keying (FSK) and binary phase shift keying (BPSK), OOK supports data rate up to half the available bandwidth. As a consequence, to achieve a data rate of 10Gbit/s, 20GHz of bandwidth is required.

It can also be shown that the error probability (or bit error rate, BER) of a non-coherent OOK demodulation is given by (2-2) [1]:

$$\text{BER} = \frac{1}{2} \exp\left(-\frac{E_b}{2N_0}\right) + \frac{1}{2} Q\left(\sqrt{\frac{E_b}{N_0}}\right)$$

(2-2)
Where $E_b$ and $N_0$ denote the average bit energy and noise power spectral density, respectively. The error rate of OOK demodulation is very close to that of FSK, but it is inferior to that of coherent BPSK (which is $Q(\sqrt{2E_b/N_0})$). However, in order to avoid the complicated carrier recovery circuit, non-coherent OOK is chosen for our short range transceiver system. Figure 2.14 compares the bit error rate versus SNR for four different modulation schemes.

![Fig. 2.14: BER as a function of SNR for 4 different modulation schemes](image)

The block diagram of the proposed mm-wave transmitter is shown in figure 2.15. The 50GHz carrier is first generated by a local voltage controlled oscillator (VCO). The output of the VCO is then fed to the power amplifier which is directly connected to the antenna.

As illustrated in figure 2.15, there are two solutions to implement an OOK modulation. The first approach (figure 2.15(a)) does this task by switching the VCO on and off. However, this approach cannot be adopted in our system. As a rule of thumb, the start-up of an oscillation can be roughly estimated to take $Q$ periods, in which $Q$ is the tank quality factor. Consequently for a $Q$ of 10 -as a reasonable assumption- 200ps is required for the start-up of the 50GHz oscillation, which is twice the 100ps bit period for a 10Gb/s signal. As a result, the idea of switched oscillators is limited to lower data rates.

In our work instead, on off keying modulation is realized by switching on and off the power amplifier (Figure 2.15(b)). This enhances the power efficiency because the amplifier is not on all the time. Furthermore, since the oscillator is not switched, high data rates can be easily achieved.
Figure 2.15: Realization of OOK modulation by switching the oscillator (a) and by switching the PA (b). The second solution is employed to realize the transmitter.

Figure 2.16 illustrates the proposed receiver architecture. A single ended LNA is connected to the antenna to provide more than 25dB of gain for a bandwidth of greater than 20GHz. A single ended to differential converter (Balun) then converts the single ended output of the LNA to a differential input for the envelope detector. A wide bandwidth is targeted for the envelope detector to eliminate the need for an equalizer or area consuming shunt peaking inductors in the limiting amplifier chain, thus helping in reducing area and power consumption. Since the output of the envelope detector is relatively faint, the generated signal would need to be further amplified to a typical logic level. This task is done by the subsequent limiting amplifier chain. Through a buffer stage, the output of the limiting amplifiers is then fed to the measurement instrument.

In order to realize simple transceiver architecture we have avoided phased array structures. However, the compact size of an mm-wave transceiver permits multiple antenna solutions, which are otherwise difficult, if not impossible, at lower frequencies. It could be shown that using antenna array for the transmit path improves the SNR by 20logN, where N is the
number of the antennas. This SNR improvement intuitively comes from having N power amplifier multiplied by array gain of N. For the receive path, SNR improvement is given by $10\log N$ provided that the noise factor of each individual block (F) is much greater than the number of the blocks (N) (i.e. $F/N \gg 1$). Intuitively, for the signal path, applying the required phase shift in each received path makes the signal in the received paths in phase. Hence they will add up in voltage while the noises from different receiver paths are added in power. Employing phased array architectures imposes its own challenges specifically in terms of power and area, but since we have not employed such architecture, we limit the discussion to this paragraph. Further information can be found in [9, 10].

### 50GHz Link Budget

In order to establish mm-wave wireless connection over a pre-determined communication distance, first system link budget for such specific application should be determined. The intended application for our transceiver is high data rate multi-media sharing in mobile devices aimed at replacing the current wired interconnect. As a result, calculations are done for 10Gb/s uncompressed 50GHz wireless communication over a distance of longer than 5cm.

In a classical receiver modeled as a linear system, the NF of the whole chain can be well approximated with the NF of the first stage (usually the LNA), if its gain is large enough. However, in an envelope detector-based receiver, the presence of the nonlinear element invalidates the above hypothesis. Consequently, the link budget calculation of such a system needs to properly take into account the interaction of signal and noise determined by the nonlinear element. As anticipation, we will show how even under the condition of large LNA gain, the NF of the whole receiver depends on the input SNR at which it is calculated. For this reason, the dramatic impact of the nonlinear element needs to be recognized for a correct link budget calculation.

The receiver system has been modeled as in Fig. 2.17(a) to investigate its noise figure performance. In this figure, $S_{in}$ is the signal at the antenna input, $n_s$ is the channel noise, $G_{LNA}$ the LNA gain, $n_{LNA}$ the output referred noise of the LNA, $a_2$ the gain of the envelope detector (i.e. the squarer block) and $n_{int}$ the input referred noise of all the blocks following the squarer (i.e. limiting amplifier and the output buffer). The noise sources are considered Gaussian with zero mean. Due to the presence of the non-linear squarer calculation of NF
is not straightforward. To further simplify the calculations, \( n_{\text{LNA}} \) has been referred to the input of the system and the squarer function has been merged with the LNA, as shown in Fig. 2.17(b) and 2.17(c), respectively. The calculations are based on the results presented in [27, 28].

The input signal to noise ratio \( \Lambda_{\text{in}} \) is given by (2-3), in which \( E[.] \) is the expected value:

\[
\Lambda_{\text{in}} = \frac{E[s_{\text{in}}^2]}{E[n_{\text{in}}^2]} = \frac{E_b B_r}{N_0 B} \tag{2-3}
\]

In equation (2-3), \( E_b \) is the energy of the bit, \( B_r \) is the bit rate, \( N_0 \) is the power spectral density of the channel noise (\( N_0 = -174 \text{dBm/Hz} \)) and \( B \) is the signal bandwidth. Accordingly, the output SNR can be calculated as (2-4):

\[
\Lambda_{\text{out}} = \frac{E[s_{\text{out}}^2]}{E[n_{\text{out}}^2]} = \frac{E[(a_2(G_{\text{LNA}}s_{\text{in}})^2)]}{E[(a_2(G_{\text{LNA}}^2(s_{\text{in}} + n_{\text{s}} + n_{\text{amp}})^2 + n_{\text{int}})^2)]} \tag{2-4}
\]

Expanding (2-4) results in (2-5):
\[
\Lambda_{out} = \frac{a_2^2 G_{LNA}^2 E_b B_n^2}{E[(a_2 G_{LNA}^2 (n_s + n_{\text{amp}})^2 + 2a_2 G_{LNA}^2 s_{in} (n_s + n_{\text{amp}}) + n_{\text{int}})^2]} \tag{2-5}
\]

Since we are calculating the output noise power due to the noises of the system, the term \( G_{LNA}^2 s_{in}^2 \) is dropped in (2-5), because it is just related to the input signal power and does not take into account any noisy term. To calculate the denominator of (2-5), we note that \( n_{\text{int}} \) is uncorrelated to \( n_s + n_{\text{amp}} \), hence its expected value can be evaluated separately as:

\[
E[n_{\text{int}}^2] = \sigma_{n_{\text{int}}}^2 \tag{2-6}
\]

The first two terms in the denominator of (2-5) can be expanded as (2-7):

\[
E[a_2^2 G_{LNA}^4 (n_s + n_{\text{amp}})^4 + 4a_2^2 G_{LNA}^4 s_{in} (n_s + n_{\text{amp}})^3 + 4a_2^2 G_{LNA}^4 s_{in}^2 (n_s + n_{\text{amp}})^2 ] \tag{2-7}
\]

The three terms in (2-7) can be written as (2-8), (2-9) and (2-10), respectively:

\[
E[a_2^2 G_{LNA}^4 (n_s + n_{\text{amp}})^4] = a_2^2 G_{LNA}^4 E[n_s^4 + n_{\text{amp}}^4 + 4n_s^2 n_{\text{amp}}^2 + 2n_s^2 n_{\text{amp}}^2 + 4n_s^4 + 4n_{\text{amp}}^4 + 4n_s^4 + 4n_{\text{amp}}^4]
= a_2^2 G_{LNA}^4 (3\sigma_n^4 + 3\sigma_{n_{\text{amp}}}^4 + 6\sigma_n^2 \sigma_{n_{\text{amp}}}^2) = 3a_2^2 G_{LNA}^4 \left(1 + \frac{\sigma_{n_{\text{amp}}}^2}{\sigma_n^2}\right)^2 = 3a_2^2 G_{LNA}^4 \sigma_n^4 \left(1 + \frac{\sigma_{n_{\text{amp}}}^2}{G_{LNA}^2 \sigma_n^2}\right)^2 \tag{2-8}
\]

\[
E[4a_2^2 G_{LNA}^4 s_{in}^2 (n_s + n_{\text{amp}})^2] = 4a_2^2 G_{LNA}^4 E_s B_s \sigma_n^2 \left(1 + \frac{\sigma_{n_{\text{amp}}}^2}{\sigma_n^2}\right) \tag{2-9}
\]

\[
E[4a_2^2 G_{LNA}^4 s_{in}^2 (n_s + n_{\text{amp}})^3] = 0 \tag{2-10}
\]

In writing the above expressions, it should be noted that the 4th order central momentum of a normally distributed random variable is equal to the triple square of its variance. It should also be noted that (2-10) is equal to zero because \( n_s \) and \( n_{\text{amp}} \) are normally
distributed random variables, hence their sum is a normally distributed random variable and the odd-order momentum of normally distributed random variables with zero mean, is zero [29].

Since

\[ 1 + \frac{\sigma_{n,\text{LNA}}^2}{G_{\text{LNA}}^2 \sigma_{n_s}^2} = F_{\text{LNA}} \]

(2-8) and (2-9) can be re-written as (2-11) and (2-12), respectively.

\[
E[a_s^2 G_{\text{LNA}}^4 (n_s + n_{\text{amp}})^4] = 3 a_s^2 G_{\text{LNA}}^4 N_n^2 B^2 F_{\text{LNA}}^2 
\]

(2-11)

\[
E[4 a_s^2 G_{\text{LNA}}^4 s_{in}^2 (n_s + n_{\text{amp}})^2] = 4 a_s^2 G_{\text{LNA}}^4 E_b B_r N_0 B F_{\text{LNA}}^2 
\]

(2-12)

As a consequence, the output signal to noise ratio, \( \Lambda_{\text{out}} \) can be written as (2-13):

\[
\Lambda_{\text{out}} = \frac{a_s^2 G_{\text{LNA}}^4 E_b B_r^2}{3 a_s^2 G_{\text{LNA}}^4 N_n^2 B^2 F_{\text{LNA}}^2 + 4 a_s^2 G_{\text{LNA}}^4 E_b B_r N_0 B F_{\text{LNA}} + \sigma_{n,\text{out}}^2} 
\]

(2-13)

The SNR tends to saturate for very large gain of the LNA. In this case, the noise contributed by the last stages becomes negligible and (2-13) can be simplified to (2-14).

\[
\Lambda_{\text{out}} = \frac{P_b^2}{3 N_0^2 B^2 F_{\text{LNA}} + 4 P_b N_0 B F_{\text{LNA}}} 
\]

(2-14)

In (2-14) \( P_b \) is the power of the signal at the input of the receiver. Notice the dependence of the denominator of (2-14) to the input signal power, which clarifies the signal contribution to the conversion of the noise to the output in this nonlinear system. Based on the result of (2-3) and (2-13) the noise figure of the receiver can be calculated as (2-15):

\[
F_{\text{RX}} = \frac{\Lambda_{\text{in}}}{\Lambda_{\text{out}}} = \frac{3 a_s^2 G_{\text{LNA}}^4 N_n^2 B^2 F_{\text{LNA}}^2 + 4 a_s^2 G_{\text{LNA}}^4 E_b B_r N_0 B F_{\text{LNA}} + \sigma_{n,\text{out}}^2}{N_0 B a_s^2 G_{\text{LNA}}^4 E_b B_r} 
\]

(2-15)
Which can be simplified to (2-16):

\[
F_{RX} = 4F_{LNA} \left( 1 + \frac{3N_0BF_{LNA}}{4E_bB_r} + \frac{\sigma^2_{in}}{4N_0B_0^2G_{LNA}^4E_bB_rF_{LNA}} \right)
\]  

(2-16)

Supposing a large gain for the LNA, the last term in (2-16) can be neglected, because it is divided by the 4\textsuperscript{th} power of the gain of the LNA. Consequently (2-16) can be re-written as (2-17):

\[
F_{RX} = 4F_{LNA} \left( 1 + \frac{3F_{LNA}}{4SNR_{in}} \right)
\]

(2-17)

Two important conclusions can be drawn from the above result. First, whatever are the noise figure and the gain of the LNA, the noise figure of the receiver will be at least 6dB higher than the noise figure of the LNA. Second, unlike the common linear case, the noise figure of the receiver does not depend solely on the noise of the receiver building blocks, but it depends also on the BER performance (or equivalently input SNR) at which it is calculated.

In order to establish an error-free communication (with BER<10\textsuperscript{-12}), at least 17dB of SNR is required at the output of the OOK receiver. Assuming a maximum noise figure of 10dB for the LNA (which is a reasonable value for a 50GHz CMOS LNA), the SNR at the input of the receiver saturates to 33dB for a large LNA gain, as depicted in Fig. 2.18.

We assume the transmitter can output a signal with power \( P_t = 10\text{dBm} \) and \( G_t \) and \( G_r \), which are the transmit and receive antenna gain, are 0dBi. Also, as a rough estimation, we consider an additional 4dB margin to take into account for antenna mismatch, PCB body loss and the losses due to fading and shadowing over the 50GHz channel. Since the noise power at the input of the receiver is equal to -71dBm, (2-3) requires the signal power to be -38dBm, as illustrated in figure 2.19.
Friis free-space path loss transmission equation [1] shown in (2-18) can then be used to calculate the communication distance. A communication distance of 6.3cm can be obtained following the above procedure.

\[
P_r = \frac{P_t G_t G_r \lambda^2}{(4\pi R)^2}
\]

(2-18)
In the above formula, $P_r$ is the received power, $P_t$ is the transmit power, $\lambda$ is the wavelength and $R$ is the distance between transmitter and receiver. At 60GHz – which is our maximum operating frequency- the wavelength is 5mm.

Figure 2.20 shows the communication distance versus the gain of LNA for the selected architecture. For an LNA gain greater than 25dB the system SNR saturates. Consequently, for a fixed LNA noise figure further increase in the gain of LNA does not lead to any increase in the communication distance.

To further increase the communication distance, we should increase the SNR at the input of the envelope detector. To achieve this goal, we can increase either the transmitter power or the receiver sensitivity. Transmitter power can be increased by increasing antenna gain or augmenting the output power of the PA. Antennas with high gain are relatively expensive and bulky and require careful alignment between TX and RX, therefore not compatible with low cost mobile applications. On the other hand, the maximum power that can be transmitted by a PA is limited by fundamental process parameters such as supply voltage and passive component losses and usually this number is less than 20dBm in Silicon technology. Hence, there is a fundamental limit to the output power that can be achieved from a stand-alone transmitter.

![Fig. 2.20: Communication distance versus the gain of the LNA assuming antennas with 0dBi gain, a NF of 10dB for the LNA, and a bit rate of 10Gbps while considering 4dB for the link margin](image)

The SNR at the receiver can be improved by increasing the receiver sensitivity. Since, the minimum bandwidth is determined by system specifications, the only way to increase the sensitivity of a stand-alone receiver is to decrease its noise figure. Illustrated in figure 2.21, is the noise figure of the receiver versus the gain of the LNA. As can be seen for a
gain greater than 25dB the only contributor to the noise figure of the receiver is the noise of the LNA. Decreasing the LNA noise figure involves high power dissipation, which is not a desirable choice with portable wireless systems and is furthermore limited by design and process constraints.

Based on the above calculations, we determine the maximum noise figure of the LNA to be 10dB and its gain to be close to 30dB to achieve the maximum SNR at the output of the receiver.

Figure 2.21 illustrate a block diagram preview of the system for link budget calculation with the numbers corresponding to maximum communication distance for error free operation.

Design requirements for each individual block in the transceiver are summarized in table 2.2. As stated, in doing the link budget calculations, the modulation scheme and the required BER were determined first and the minimum required input power was then
calculated using equation (2-26), knowing the signal bandwidth and the receiver noise figure:

\[ P_{in} \geq SNR + NF + 10 \log (KTB) \]  

(2-26)

We are targeting a data rate of 10Gb/s for a communication distance longer than 5cm. A data rate of 10Gb/s necessitates a bandwidth of 20GHz for the transceiver. Considering the spread due to process variation and the non-ideal frequency transfer characteristic of each individual block -the cascade of which determines the overall frequency response- the RF bandwidth of each block requires to be more than 20GHz. For the same reason the tuning range of the VCO is decided to be more than 10% to have a reasonable safety margin.

<table>
<thead>
<tr>
<th>VCO Design Requirements</th>
<th>PA Design Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power Consumption</td>
<td>&lt; 8 mW</td>
</tr>
<tr>
<td>Center Frequency</td>
<td>50 GHz</td>
</tr>
<tr>
<td>Tuning Range</td>
<td>&gt; 10%</td>
</tr>
<tr>
<td>Phase Noise</td>
<td>*</td>
</tr>
<tr>
<td>Gain</td>
<td>&gt; 15 dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>&gt; 20 GHz</td>
</tr>
<tr>
<td>Output Power</td>
<td>&gt; 10 dBm</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>LNA Design Requirements</th>
<th>ED Design Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power Consumption</td>
<td>&lt; 30 mW</td>
</tr>
<tr>
<td>Gain</td>
<td>&gt; 25 dB</td>
</tr>
<tr>
<td>Noise Figure</td>
<td>&lt; 10 dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>&gt; 20 GHz</td>
</tr>
<tr>
<td>Output Integrated Noise Power</td>
<td>&lt; 500n V²</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>LA Design Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power Consumption</td>
</tr>
<tr>
<td>Gain</td>
</tr>
<tr>
<td>Bandwidth</td>
</tr>
</tbody>
</table>

* Since on-off keying is adopted as the modulation scheme, there is no stringent requirement on the phase noise of the VCO
The output power of the PA and the gain of the ED are also determined based on the calculations and simulations done in this section.

The bandwidth of the LA chain is selected to be between 5 to 7GHz (between 0.5\(B_r\) to 0.7\(B_r\)) as a compromise between noise and Inter Symbol Interference (ISI).

Furthermore, to realize a low power implementation we have dedicated 100mW of power budget for each of transmit and receive paths. This limits the maximum possible power dissipation of each individual block.

**Design Methodology**

Throughout this thesis, the focus will be on reducing power consumption and increasing system SNR with the aim of achieving a lower BER and/or a higher communication distance. Standard digital CMOS processes are utilized to realize a low cost implementation of this goal. Passive components can be either lumped (as in the VCO and the PA) or distributed (as in the LNA) and no preference is assumed. Either choice will be shown to be valid depending on the particular circumstances of each design.

Nevertheless, in order to carefully account for all parasitic loading and distributed effects, all interconnects and passive components are simulated using 2.5D Agilent Momentum full-wave EM simulator.

Furthermore, all transistors use standard design kit models with layout parasitic extraction for extrinsic parasitic resistances and capacitances.

**Chapter Summary**

Design of mm-wave transceiver systems involves different design considerations at all levels of construction, from device modeling, to circuit design and to system architecture. With the wide bandwidth available at these frequencies, increasing interests are drawn by high data rate wireless applications and many transceivers have been already implemented.

Design requirements and bottlenecks of mm-wave transceivers design were briefly discussed in this chapter. From the system architecture point of view, low cost and low power consumption are two important requirements for applications targeting mobile
devices. As a consequence, different low power, low cost mm-wave transceiver architectures were presented and their performance merits and cons. were investigated.

Furthermore, our proposed solution for realizing a low power, low complexity, high data rate short range link was introduced. The link budget for implementing such a transceiver was calculated and individual blocks design requirements were determined.
Chapter 3

LNA: High Gain-Bandwidth at Low Power

Introduction

In the previous chapter the link budget for constructing a mm-wave short range transceiver was calculated. Before investigating the transmitter and receiver sections of the system, in this chapter design and analysis of a wideband high gain low noise amplifier as a standalone block will be discussed. The same core will then be used for implementing the low noise amplifier of the receiving side of our 50GHz transceiver.

Performance of mm-wave and sub-terahertz communication systems relies on low noise figure and high gain and linearity of the receiver. The low noise amplifier, as the first block of a receiver, provides enough gain to amplify the signal and to overcome the noise of subsequent stages. A relatively flat gain response with as low variation as possible (e.g. less than 1dB) is required for the LNA in the frequency range of interest to allow equal signal amplification over the entire frequency band of operation. In addition to a high and flat gain, a low and smooth noise figure is also required for the LNA to maintain a constant SNR for the receiver across the operating bandwidth of the system.

The LNA should have good matching both at its input and output to deliver the maximum power to the blocks following the LNA. Furthermore, since LNA must interface with the outside world, the impedance seen by the LNA—which is generally defined to be equal to 50Ω—experiences variation from its nominal value. For example, the user may wrap his hand around the antenna or put the cell phone in his pocket, resulting in such a change. Hence it becomes mandatory for the LNA to remain stable for all source impedances at all frequencies. One may think that the LNA must operate properly just in the frequency band
of interest and not necessarily at other frequencies, but if the amplifier begins to oscillate at any frequency, it becomes highly nonlinear and its gain is severely compressed.

In most applications, the LNA does not limit the linearity of the receiver. Due to the increased gain through the RX chain, the latter stages, i.e. the baseband amplifiers or filters tend to limit the overall input IP3 or P1-dB. Nevertheless, the linearity of the LNA becomes critical in wideband receivers that may sense a large number of strong interferes [16]. Fortunately, in contrast to the UWB spectrum, the number of systems operating at or around 60GHz is limited, relaxing the concerns for LNA linearity requirement.

Stringent trade-off between aforementioned parameters makes the design of low noise amplifiers challenging. In the following sections design and analysis of wideband low noise amplifiers is investigated and solutions to satisfy the required design criteria are discussed in more details.

**General considerations on mm-wave LNAs**

The intrinsic gain of CMOS devices operating close to their transit frequency reduces and their noise figure increases. Consequently, achieving a good FOM for a LNA at mm-wave is a nontrivial task. This necessitates employment of modern nano-scale technologies to achieve the desired functional specifications at mm-waves. As CMOS technology scales, transistors’ \( I_D-V_{GS} \) characteristic in the saturation region becomes linear while the transconductance value, minimum noise figure, \( f_T \), and \( f_{\text{max}} \) improve. On the other hand, the supply voltage reduces and the voltage headroom and linearity becomes limited [30].

Generally, in realizing the building blocks of an RF radio, differential architectures may be preferred over their single-ended counterparts. Differential circuits provide greater rejection of common-mode noise, better performance despite process-voltage-temperature (PVT) variations, and less susceptibility to cross talk and noise within the circuit layout [31]. They are also less susceptible to parasitic feedback loops that can cause stability problems or degradation of circuit performance [32].

On the other hand, single-ended low noise amplifiers are superior in terms of noise performance, power consumption and area. They can also be more easily connected to the single ended antenna eliminating the need for an area consuming, lossy balun circuit. These make single ended LNAs a more preferable choice compared to their differential coequals.
Due to the limited gain at mm-wave ranges of frequencies, most of mm-wave low noise amplifiers adopt a cascode or common source structure. Cascode design facilitates good isolation between input and output of the circuit and allows for separate input and output matching. In the cascode amplifier, the common-gate stage provides low impedance at the drain node of the common source stage leading to negligible Miller effect on the input (CS) transistor.

Cascode devices provide substantially higher gain at low frequencies but experience a drop in maximum available gain and an increase in their noise figure sooner at higher frequencies because of the parasitic pole at the drain node of the common source stage, even if devices use a shared-junction layout [33].

Furthermore, in CMOS RF amplifiers for wireless applications, transistor sizes typically tend to be maximized for the sake of improving the gain and noise performance for the limited amount of bias current. Therefore, with practical cascode amplifiers based on CMOS, the current gain of the common gate amplifier can be less than one which leads to less overall power gain. These issues with the cascode architecture together with the superior noise performance of common source stages make common source amplifiers a better choice for realizing low noise amplifiers if the stability issues -related to this type of amplifiers- can be resolved.

A comparison in [34] for a narrowband 2.4 GHz LNA further support the idea that cascaded common source stages can provide higher gain and better noise performance compared to their cascode counterparts. We have repeated the same experiment and observed a similar trend at mm-wave. The problem with the increased consumed power in cascaded common source stages can then be resolved employing a current sharing architecture (See Table 4.1).

<table>
<thead>
<tr>
<th>Topologies</th>
<th>Power Gain (dB)</th>
<th>Noise Figure (dB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple Cascode</td>
<td>13.5</td>
<td>4.1</td>
</tr>
<tr>
<td>Cascode with inductive load at drain of M1</td>
<td>16.4</td>
<td>4.1</td>
</tr>
<tr>
<td>Simple Current Sharing</td>
<td>19.7</td>
<td>3.4</td>
</tr>
<tr>
<td>Current Sharing with series inter-stage resonance</td>
<td>23.1</td>
<td>3.2</td>
</tr>
</tbody>
</table>

From another perspective, behavior of passive devices is another factor that limits mm-wave circuits’ performance. Since metal stacks are getting closer to the substrate by
technology scaling, passive performance usually worsens. In the absence of special process features like ultra thick top metal, such passive device performance degradation should be expected from one technology node to the other one with smaller gate length (Fig 3.1).

![Separator of different metal layers and their spacing from substrate for two different technology nodes](image)

**Fig. 3.1:** Separation of different metal layers and their spacing from substrate for two different technology nodes (Figure not to scale)

Furthermore, since the silicon substrate resistivity is relatively low, the high frequency signal leaks into the lossy substrate and increases the passive loss at mm-wave frequencies. At mm-waves, in addition to the electric loss, magnetic loss which is mostly attributed to the inducing eddy current into the substrate, influences the performance of passive devices and should be taken into account [33].

During the past years, various techniques have been explored to improve the design of mm-wave low noise amplifiers. Good LNA performance becomes more difficult, due primarily to the need for more stages to achieve the required gain, and the greater variations in device parameters, due to manufacturing tolerances for deep sub-micrometer devices [31]. In the following sections, after an overview on the works presented in the literature, we try to address these issues by introducing a power efficient, wideband, high gain low noise amplifier.
An overview on the gain-bandwidth enhancement techniques

The bandwidth requirement of RF amplifiers is continuously increasing following the drive for higher speed systems. In fact the specifications dictated by system requirements, necessitates design of high gain LNAs for a large operating bandwidth.

Achieving a high gain-bandwidth product has always been a big challenge for RF designers. Considering shortage of the gain at mm-wave frequencies the problem becomes more severe. The situation is much more pronounced for silicon-based integrated circuits due to the inferior parasitic characteristics in these technologies.

In order to realize gain in the range of 20dB or more, mm-wave low noise amplifiers employ multiple stages which result in higher dissipated power. On the other hand, cascading identical stages reduces the bandwidth from its original value. Consequently, increasing the amount of consumed power by cascading several stages does not necessarily lead to a higher gain-bandwidth product. Figure 3.2 illustrates the gain-bandwidth product of some of the state of the art amplifiers versus power dissipation.

Before analyzing our proposed architecture and its ability to extend the gain-bandwidth product, it would be noteworthy to have a short survey on broadbarding techniques in the literature [35-42].
Since an improvement in the bandwidth of the amplifier is often accompanied by a corresponding drop in its low-frequency voltage gain, the gain-bandwidth product (GBW) can serve as a first-order figure of merit for an amplifier topology in a given process technology [35].

The transimpedance and GBW of a linear single stage amplifier connected to a simple RC load are given by (3-1) and (3-2), respectively:

\[
Z(s) = \frac{V_{out}}{I_{in}} = \frac{R}{1 + sRC} \quad (3-1)
\]

\[
GBW = \frac{g_m}{2\pi C} \quad (3-2)
\]

Apparently, the parasitic capacitance (associated partly to the driving stage and partly to the load) limits the bandwidth by reducing the output impedance of the amplifier as the frequency grows. Therefore, retaining an uniform output impedance over a wider frequency range will increase the bandwidth. Bode-Fano analysis show that any one-port passive network added in parallel to C can improve the GBW (at most) by a factor of two.

First-order shunt peaking has been frequently used as such a one port network to introduce a resonant peaking at the output as the roll off of the amplitude starts at high frequencies. In this technique an inductor is inserted in series with the output load (Fig. 3.3 (a)). For the shunt-peaked network the gain is simply the product of the transimpedance and the transconductance \(g_m\). Since \(g_m\) is approximately constant, transimpedance becomes the detrimental factor in defining the transfer function [40]. For the shunt-peaked network:

\[
Z(s) = \frac{R + sL}{1 + sRC + s^2LC} \quad (3-3)
\]

In fact, the inductor introduces a zero in \(Z(s)\) that increases the impedance with frequency, hence, compensating the decreasing impedance of C, and thus extending the 3-dB bandwidth. An equivalent explanation for the increased bandwidth is the reduced rise
time. Intuitively, insertion of the inductor in series with the resistive load delays current flow to this branch at higher frequencies, such that more current initially charges C, which reduces the rise time.

Substituting $1/RC$ with $\omega_0$ as the 3-dB bandwidth of the reference common source amplifier, and using the variable $m=R^2C/L$ in (3-3), it will become as (3-4) after normalizing to the impedance seen at DC, (R) [40]:

$$Z_N(s) = \frac{1+2/m\omega_0}{1+s/\omega_0+s^2/(m\omega_0)^2} \quad (3-4)$$

For shunt peaking, $m=\sqrt{2}$ gives the maximum Bandwidth Enhancement Ratio (BWER) of 1.84 at the cost of 1.5 dB of peaking. A maximally flat gain is achieved for $m=1+\sqrt{2}$ but BWER is reduced to 1.72 [35, 40].

Although in this approach the increased impedance of the inductor results in bandwidth enhancement, it also leads to peaking in the response. As a consequence, techniques to avoid the peaking with maximum BWER are desired. As illustrated in figure 3.3 (b), one solution is to add in shunt with the inductor a capacitor. The capacitor should be large enough to negate peaking but small enough not to significantly change the gain response [40].

![Fig. 3.3: Simple (a) and bridged (b) shunt peaking](image)

Compared to the case without the parallel capacitor, this capacitor (CB) introduces another pole and zero in the input-output transfer function as can be seen through equation (3-5):
In this equation $K_B = C_B/C$, $\omega_0 = 1/RC$ and $m = R^2C/L$. For $K_B = 0.3$ a BWER of 1.83 is achieved, comparable to the almost identical value of 1.84 for the simple shunt peaking approach, but without the 1.5 dB peaking associated to the latter (i.e. a flat gain response is obtained).

A subtle advantage of bridged-shunt peaking over simple shunt peaking is that the maximum bandwidth is achieved for a larger value of $m$, which means a smaller inductance with smaller die area and higher self-resonance frequency.

Nonetheless, the maximum BWER in the case of a one port matching network -that can be achieved by using a distributed LC ladder circuit for the load- is 2. To achieve a larger BWER, the general case of two port loads, which consists of T-coil peaking, and combination of shunt and series peaking, should be considered. It has been proven that the maximum achievable BWER with any two port passive inter-stage network is 4.93, under the condition that the load capacitance is equal to the output capacitance of the transconductor cell, which is called a balanced case [37]. This shows an approximate improvement of 2.5 times compared to the one port case.

Applying proper matching networks between amplifier stages is the key step in improving wideband amplifiers bandwidth with this method [35]. The principal of operation of these techniques is based on separating the intrinsic output resistance and capacitance of the transistor ($R_1$ and $C_1$) from those of the load namely $R_2$ and $C_2$ which limit the bandwidth of the amplifier. (See Fig. 3.4).

**Fig. 3.4:** Small-signal model of a single stage amplifier with the loading effect of the next amplifying stage

The gain-bandwidth product for the network shown in figure 3.4 is given by (3-6):

\[
Z_n(s) = \frac{1 + \left( \frac{1}{m} \right) \frac{s}{\omega_0} + \left( \frac{K_B}{m} \right) \frac{s^2}{\omega_0^2}}{1 + \frac{s}{\omega_0} + \left( \frac{K_B + 1}{m} \right) \frac{s^2}{\omega_0^2} + \left( \frac{k_B}{m} \right) \frac{s^3}{\omega_0^3}}
\]  

(3-5)
With the aim of maintaining a higher impedance over a wider frequency range, a passive network can be inserted between \(C_1\) and \(C_2\) to separate and isolate them. Consequently, \(C_1\) will be the only capacitor that affects GBW at the input port of the network.

Bode has shown that, for \(C_1 = C_2 = C/2\) it is possible to design the matching network between the two R-C sections on different sides of figure 3.4 in such a way that the GBW product at the output port is the same as that of the input. Thus, for a single stage amplifier with a two-port passive load network, the maximum gain bandwidth product will be [35]:

\[
GBW_{\text{max}} = \frac{2g_m}{\pi C}
\]

(3-7)

Which shows an improvement of 4 times compared to the original RC loaded amplifier. This can be achieved by using a constant-k LC-ladder filter terminated to its image impedance. A constant-k LC-ladder filter that is terminated in its image impedance has a constant transfer function over frequencies less than its cutoff frequency.

Typically, it is computationally difficult to calculate the component values for the optimizing two port network directly. Even in the case of a third-order system, with only one additional inductor between the device and the load (see figure 3.5) the calculations are not straightforward. Graphical or numerical methods are used instead to obtain the value of the inductor that maximizes the bandwidth [35].

![Third order ladder network at the output of an amplifying stage](image)

**Fig. 3.5:** Third order ladder network at the output of an amplifying stage
To achieve a frequency response at the output of the ladder network shown in figure 3.5 with maximum flatness, components values should be equal to their corresponding third-order Butterworth filter elements as follows:

\[
C_1 = \frac{1}{R_1(1-\delta)\omega_c}
\]  
(3-8)

\[
L_2 = \frac{2}{(1-\delta + \delta^2)\omega^2\omega_c C_1}
\]  
(3-9)

\[
C_2 = \frac{1}{R_2(1+\delta)\omega_c}
\]  
(3-10)

In which \(\omega_c\) is the 3-dB cutoff frequency of the network and \(\delta\) is a manifestation of impedance transformation between \(R_1\) and \(R_2\) that is defined as [35]:

\[
\delta = \frac{\sqrt{R_1 - R_2}}{\sqrt{R_1 + R_2}}
\]  
(3-11)

From (3-8), the new amplifier’s bandwidth at the output of the ladder structure is:

\[
\omega_{c,new} = \frac{1}{(1-\delta)R_1 C_1}
\]  
(3-12)

Substituting (3-12) in (3-9) will then give the value of the inductor that should be added to the original network of figure 3.5. The value of \(C_2\) in the original amplifier may be different to that calculated in (3-10). Some explicit capacitances can then be added to the architecture to accommodate this. If the BWER is defined as the ratio between the new 3-dB bandwidth and the one before insertion of the inductor, it can be shown to be equal to [35]:

52
It should be noted that equation (3-13) is valid for the condition of a maximally flat gain and shows the amount of improvement achieved by inserting the inductor compared to the original case of simple R-C load for different source and load capacitances. Equation (3-13) does not imply that the larger is $C_2$ the larger will be the obtained bandwidth; However, it states that the larger is the load capacitor ($C_2$), the larger is the amount (i.e. percentage) of bandwidth improvement compared to the original case (before insertion of the inductor). It also states that the amount of bandwidth improvement depends on the ratio of source and load resistors.

The transimpedance in this case is given by equation (3-14) with $K_c=C_1/(C_1+C_2)$:

$$Z_N(s) = \frac{1}{1 + \frac{s}{\omega_0} + \left(\frac{1-K_c}{m}\right)s^2 + \left(\frac{K_c(1-K_c)}{m}\right)s^3}$$  \hspace{1cm} (3-14)$$

As expected, for a fixed aggregate capacitance, separation of $C_1$ from $C_2$ creates another pole, which affects BWER depending on the value of $K_c$. As the parasitic capacitance ratio increases, BWER increases to a maximum of 2.52 for $K_c=0.3$ ($C_1=0.3C$ and $C_2=0.7C$). If the passband peaking that occurs for higher values of $K_c$ is acceptable, even larger BWER can be obtained (See Fig. 3.6) [40].

Looking at the circuit in time domain, without L, the transistor charges $C=C_1+C_2$, but with L only $C_1$ is initially charged, because the added inductor delays current flow to the rest of the network. This reduces rise time at the drain node and increases bandwidth.

Combining the two aforementioned approaches (inductive peaking of the bridged-shunt approach and capacitive splitting of the series-peaked circuit) results in the bridged-shunt-series-peaked network of Fig. 3.7. Although two inductors are now required, larger BWER is provided compared to the shunt and series peaked counterparts [40].

\[ BWER = \frac{\omega_{\text{new}}}{\omega_{\text{old}}} = \frac{1}{1 - \delta} \frac{R_2}{R_1 + R_2} \frac{C_1 + C_2}{C_1} \]  \hspace{1cm} (3-13)
Fig. 3. 6: Ideal bandwidth improvement with series peaking versus k = C1/C [40].

![Diagram](image)

Fig. 3. 7: A common-source amplifier with bridged-shunt-series peaking

Using \( m_1 = R^2(C_1 + C_2)/L_1 \), \( m_2 = R^2(C_1 + C_2)/L_2 \), and defining \( K_B, K_c, \) and \( \omega_0 \) as before, the normalized transimpedance of the bridged-shunt-series-peaked network is given as (3-15). A BWER of even 4 is possible with this approach [40].

\[
Z_v(s) = \frac{1 + \left( \frac{1}{m_1} \right) s + \left( \frac{K_B}{m_1} \right) s^2}{1 + s + \left( \frac{1 + K_B}{m_1} \right) s^2 + \left( \frac{K_B (1 - K_B)}{m_1 m_2} \right) s^3 + \left( \frac{(K_B + K_C) (1 - K_C)}{m_1 m_2} \right) s^4 + \left( \frac{K_B K_C (1 - K_C)}{m_1 m_2} \right) s^5}
\]

(3-15)
Equation (3-15) can be used in programming software for graphical representations similar to figure 3.6 to investigate the effect of different elements of the network on BWER.

Bridged-shunt-series peaking gives a large BWER whenever \( K_c \) is larger than 0.3. However, for large load capacitances (i.e. \( K_c < 0.3 \)), the capacitive-splitting action of \( L_2 \) and the bridging action of \( C_B \) become ineffective in achieving a large BWER.

As presented in figure 3.8 relatively similar networks are employed in [36, 42] to increase the bandwidth. The amplifier realized with this network in [42] is called “triple resonance amplifier (TRA)”. The name arises from the fact that the frequency response of this network defines three distinct resonance frequencies.

The network is interpreted as a combination of a π-network composed of \( C_1 \), \( C_2 \), and \( L_2 \) and an inductive load composed of \( R_1 \) and \( L_1 \). A bandwidth improvement of 3.6 is achieved with a gain peak of 1.8 dB provided that \( C_1 \) and \( C_2 \) are equal (i.e. balanced condition).

![Diagram of TRA and simplified model](image)

**Fig. 3.8:** Triple-resonance amplifier (TRA) (a) and its simplified model (b) [42]

If the inductor \( L_2 \) is added in series with \( C_2 \) such that \( L_2 \) and \( C_2 \) resonate at \( \omega_1 \) (hence, acting as a short) they absorb all of the drain current. Consequently, rather than \( C_1 + C_2 \), \( I_D \) flows only through \( C_2 \), leading to a more gradual roll-off of gain.

\[
\omega_1 = \frac{1}{\sqrt{L_2C_2}}
\]  

(3-16)

For maximum flatness, the authors have set the output voltage at this frequency (\( V_{out} = I_m/C_2\omega_1) \) to be equal to that at low frequencies (\( V_{out} = I_mR_1 \)), which gives:
In the above calculations it is assumed that $C_1=C_2=C/2$. Furthermore, $L_2=2L_1$ because before the insertion of $L_2$, $L_1$ resonates with $C_1+C_2$ while after the insertion, $\omega_1$ is determined by the series resonance of $L_2$ and $C_2$. The series resonance of $L_2$ and $C_2$ not only forces all of $I_{in}$ to flow through $C_2$, but also reverses the sign of the impedance $Z_x$, thus making $V_x$ (fig. 3.9) negative for $\omega>\omega_1$. Consequently, as demonstrated in Fig. 3.9 (b), $I_1$ and $I_2$ must flow into node X and together with $I_{in}$, pass through $C_2$. The capacitive current $I_2$ multiplied by the impedance of $C_2$ creates a relatively constant output voltage as $\omega$ increases, while the inductive current $I_1$ introduces a roll-up in $V_{out}$. Accordingly, $|V_{out}/I_{in}|$ continues to rise until the $\pi$ network composed of $C_1$, $L_2$ and $C_2$ begins to resonate (Fig. 3.9 (c)), presenting an open circuit at node X and forcing all of $I_D$ to flow through $R_1$ and $L_1$. This resonance frequency is given by [42]:

$$\omega_2 = \frac{1}{\sqrt{\frac{L_2}{C_1+C_2} \frac{C_1+C_2}{L_2+C_1+C_2} + C_1+C_2}} = \sqrt{2}\omega_1$$

(3-18)

Since, at $\omega_2$, $C_1$ and $C_2$ carry equal and opposite currents, the output voltage can be given by:

$$|V_{out}| = |V_X| = |I_{in}| \sqrt{R_1^2 + \frac{L_2^2}{\omega_2^2} \omega_2^2} = |I_{in}| \sqrt{\frac{3}{2} R_1}$$

(3-19)

This means a peaking of $\sqrt{3/2} = 1.8\text{dB}$ in the magnitude response of the amplifier. For $\omega>\omega_2$, the $\pi$ network becomes capacitive and $|V_{out}/I_D|$ begins to drop, returning to the mid-band value ($R_1$) when the impedance of the $\pi$ network resonates with $L_1$ (Fig. 3.9(d)). This third resonance frequency is given by (3-20) [42]:

$$\omega_3 = \frac{\sqrt[4]{6}}{\omega_1}$$

(3-20)
The 3-dB bandwidth exceeds this value and is approximately equal to:

\[ \omega_{3-\text{dB}} \approx \sqrt{3} \omega_1 = \frac{2\sqrt{3}}{R_1 C} \]  

(3-21)

In other words, under a balanced load condition, the bandwidth extends by almost 3.5.
For the case that \( C_1 \) and \( C_2 \) are not equal (which is the most common situation), the authors in [36] propose the cascaded distributed amplifier (CDA) as the preferred choice (Fig. 3.11). Since \( C_2 \) is often larger than \( C_1 \), TRA technique is claimed to suffer more from the capacitive loss of the low-pass L section beyond the resonance of \( C_2 \) and \( L_2 \). This will decrease the amount of enhancement on the bandwidth. For instance, for the case that \( C_1=0.5C_2 \), TRA improves the bandwidth by a factor of only 2.3 instead of the expected value of 3.5 [36]. The CDA technique resolves this problem, simply by changing the node at which the bias inductor (\( L_2 \) in Fig. 3.11) is added to the circuit.

![Fig. 3.11: Cascaded distributed amplifier (CDA) (a) and its simplified model (b)](image)

Distributed amplification is another approach to achieve wide operating bandwidth. In this solution, the gain stages are separated through transmission lines. Consequently, using distributed amplification the gain contributions of several stages are added together, but the parasitic capacitors of these stages are isolated by the artificial transmission lines [36]. In the absence of losses, the gain-bandwidth product can be improved without any limit by increasing the number of the stages. In practice, the improvement is limited by the losses of transmission lines. Furthermore, since distributed amplifiers are composed of several cascaded stages, it is difficult to meet the low-power, small area requisition for them. On the other hand, since the bias currents of all the stages flow through the same load, the circuit suffers from a severe trade-off between the voltage gain and the voltage headroom. Moreover, design of distributed amplifiers requires careful iterative electromagnetic simulations and very accurate modeling of transistor parasitics [35, 40, 41]. These drawbacks make the design of distributed amplifiers a less desirable choice.

In [40] at the cost of 2 dB peaking, a BWER of 5.59 is obtained employing an asymmetric T-coil configuration. Asymmetric T-coil is a remedy that leads to high BWER for small values of \( K_c \). As shown in figure 3.12 this approach is based on the mutual coupling
between the two inductors $L_1$ and $L_2$. However, when several stages are cascaded to provide a high gain, the large number of the coils, makes the layout of the circuit extremely complicated and even after several time-consuming EM simulations -required to predict the behavior of the circuit- the amount of unwanted coupling effectively makes this approach an inferior choice in a multi-stage design.

**Fig. 3. 12:** Asymmetric T-Coil approach employed in [40] for bandwidth enhancement with its equivalent small signal model

Cross coupled neutralization is another method employed to improve the gain-bandwidth product. The main drawback of the gain-boosting method using cross coupled neutralization is the increased $Q$ of the input and output impedances, making it more difficult to match the cross coupled design over wideband at low loss, especially for low power designs utilizing small cores [43].

Higher order networks can also be employed as the inter-stage. They provide wider bandwidth and sharper transition from passband to stopband. However, they may cause some practical issues, such as unreasonable components values, a large number of passive elements (hence large die area), additional signal loss due to these passive components and stability concerns. Typically these issues limit the order of the network to five (i.e., only three additional passive components) [35].

As stated earlier, achieving the desired gain with a single-stage amplifier is often hard. Consequently, several stages should be connected in cascade to provide the required gain. For cascaded amplifiers, the total gain is the product of the gain of each stage; however, the overall bandwidth is less than the bandwidth of each stage. This is due to accumulation of the gain drop in the pass band of each amplifier.
Considering a cascade of \( N \) similar single-pole amplifier stages with gain \( A_v \) and bandwidth \( \omega_0 \) with no mutual loading, the overall 3-dB bandwidth and the GBW are respectively given by (3-22) and (3-23) [35]:

\[
\omega_{overall} = \omega_0 \sqrt{\frac{N}{2}} - 1 \tag{3-22}
\]

\[
GBW = A_v^N \omega_{overall} \tag{3-23}
\]

Comparing to the gain-bandwidth product of a single stage \( (A_v \omega_0) \), the gain-bandwidth improvement is equal to:

\[
\frac{GBW_{multi\text{stage}}}{GBW_{one\text{-}stage}} = A_v^{N-1} \sqrt{\frac{N}{2}} - 1 \tag{3-24}
\]

In practice, the overall bandwidth is even lower because each stage has a loading effect on its previous stage, which reduces the latter bandwidth and consequently the overall bandwidth.

**Design examples from state of the art**

Since, for a second order LC network the gain-bandwidth product of each stage is set given the input stage transconductance \( g_m \) and the node capacitance \( C \), it can be interpreted, in a narrowband around the carrier, to be equivalent to a first order bandpass filter characterized by a 3-dB bandwidth equal to \( 1/RC \) and a gain equal to \( g_mR \), where \( R \) is the equivalent parallel resistance at resonance and \( C \) is the total capacitance of the node [44].

As mentioned earlier and can be seen in figure 3.13 (a), typically an inductor tunes out the capacitive parasitic at each inter-stage gain node, realizing a second order LC network (\( C_{\text{bypass}} \) is large, creating a perfect coupling between the stages).
Following the same concept but with the aim of further enhancing the bandwidth, the authors in [44] use a bandpass filter of higher order to break the trade-off between the gain and the bandwidth. The idea is to make the two capacitors in figure 3.14 resonate separately by parallel inductors and the inter-stage coupling be realized by capacitor $C_c$. This topology and its magnetically coupled dual were previously introduced to increase the selectivity of bandpass filters in the design of IF stages for analog televisions [45].

![Diagram of amplifier stages and inter-stage matching network](image)

**Fig. 3.13:** Cascade of two gain stages in a typical amplifier (a) and the small signal equivalent model (b)

The transfer function of this inter-stage matching network is given by equation (3-25):

$$
G_{cr}(s) = -g_m \frac{s^3 k \omega_0 \sqrt{R_1 R_2}}{(Q(1+k)s^2 + s\omega_0 + Q\omega_0^2)(Q(1-k)s^2 + s\omega_0 + Q\omega_0^2)}
$$

(3-25)
Where:

\[ \omega_b = \frac{1}{\sqrt{L_1(C_1 + C_c)}} = \frac{1}{\sqrt{L_2(C_2 + C_c)}}, \]

\[ Q = \frac{R_1}{\omega_b L_1} = \frac{R_2}{\omega_b L_2}, \]

And

\[ k = \frac{C_c}{\sqrt{(C_1 + C_c)(C_2 + C_c)}} \]

Equation (3-25) has two maxima of equal amplitudes at the frequency of the complex poles pairs. A higher \( k \) (i.e., higher \( C_c \)) determines a higher separation between the poles pair, and consequently a wider bandwidth but at the cost of higher in-band ripple. The gain-bandwidth product (GBW) of this structure is given by [44]:

\[ GBW = b \frac{g_m}{2 \sqrt{(C_1 + C_c)(C_2 + C_c)}}, \quad (3-26) \]

Where \( b \) is a parameter dependant on the in-band ripple. As an example, an improvement of 2x is achieved at the cost of 0.6dB ripple for the case \( C_c \ll C_1 = C_2 = C \). The increase in the GBW product can be attributed to the fact that for the proposed matching network GBW is proportional to \( 1/C \) while for the tuned alternative it is proportional to \( 1/2C \).

Employing the aforementioned matching network, the authors in [44] have realized a 3-stage cascode mm-wave amplifier, the schematic of which is shown in figure 3.15. In this design equal sizes were chosen for the input and common gate devices to allow a shared junction inter-digitized layout with the aim of minimizing the stray capacitance at the common mode which is responsible for gain and noise degradation. Devices were biased at a current density of 225\( \mu \)A/\( \mu \)m for maximum transit frequency. It should be noted that all the inductors in figure 3.15 are realized by transmission lines.

As demonstrated in this figure, the input matching network is made of a capacitively loaded stub and a series transmission line, which resonate out the input pad capacitor and realize a wideband input matching. The capacitor is a short at the operating frequency of
Although inductive degeneration reduces the achievable gain\(^1\) it is employed at the source of the first stage to realize the input match and to reduce the input device noise contribution.

---

\(^1\) In fact by employing inductive degeneration, the effective transconductance of the device changes by \(Q \cdot g_m\) (in which \(Q\) is the quality factor of the input matching network). With typical device sizes at mm-wave the \(Q\) is smaller than one. Consequently, the gain decreases compared to the simple common source case.
Lower number of elements at the input reduces the losses, improves NF and decreases stability issues at low frequencies.

Fig. 3.16: Schematic of the amplifier proposed in [46]

The LNA stages were biased at a current density of 100μA/μm. It achieves a NF of 4.6dB with 23dB of gain including the losses of the baluns and consumes 16mW of power while covering the entire 60GHZ ISM band with excellent immunity to ESD.

However, as stated in the previous section, the main drawback for cross coupling is the increased quality factor of the input impedance which makes wideband low loss matching realization much harder especially for low power designs with small cores. For example for a typical transistor size of 20μm the input Q with neutralization increases from 4 (for the case without neutralization) to as high as 10. Furthermore, when the amplifier is optimized for gain, sensitivity to process variation increases significantly [46]. Moreover, neutralization techniques are limited to differential amplifiers in general and cannot be used in single ended LNAs.

Another good work on wideband high gain mm-wave low noise amplifiers with noticeable results is presented in [47]. As shown in figure 3.17 in this work a four-stage common source CMOS LNA is presented employing an asymmetric layout for the transistors.

The layout is asymmetric in the sense that the distance between gate poly and source contact is kept to a minimum while that of the gate and drain contact is increased. The reasoning can be understood looking at equation (3-27).

\[
f_{\text{max}} = \frac{f_T}{2 \sqrt{\frac{g_m C_{gd}}{(C_{gs} + C_{gd}) + (R_g + r_{ch} + R_s) g_{ds}}}} \quad (3-27)
\]

From (3-27) it can be understood that, decreasing the gate-drain capacitance \(C_{gd}\), resistance of the gate \(R_g\), and resistance of the source \(R_s\), increases \(f_{\text{max}}\) and consequently
MAG. As a consequence, the distance between the gate poly and the drain contact was increased in [47] to make possible such a decrease in the value of C_{gd}.

The current density of all the stages is set to 0.15mA/μm, which is the optimum current density for minimum noise figure.

![Fig. 3. 17: The low noise amplifier proposed in [47]](image)

The LNA in [47] operates in different gain modes by changing the bias voltages. A maximum power gain of 24dB is achieved at 53GHz in the high gain mode. The measured 3-dB bandwidth is about 17GHz from 51 to 68GHz in the high gain mode and an 18dB variable gain from 4dB to 22dB is realized at 60GHz by adjusting the bias voltage. The LNA operates with a noise figure from 4 to 7.6dB while dissipating 30mW of power.

As can be seen through the above examples both lumped and distributed elements have been used to realize low noise amplifiers. It would be therefore beneficial to have a comparison between the lumped and distributed components that can potentially be used in the design of the circuit prior to introducing the proposed architecture.

**Lumped vs. distributed passive components**

Transmission line or standard spiral inductors can be used to implement the inductances at mm-wave frequencies. The decision to choose transmission lines or spiral inductors should be made after considering size, layout requirements, and electrical performance of each structure, and subsequent to conducting simulations of each approach using EM simulators such as High Frequency Structure Simulator (HFSS) or Agilent Momentum.

One general consideration that would also be useful for design of blocks other than LNAs, is to carefully engineer current return paths to reduce the amount of current generated in the lossy substrate via the connection of the substrate to a uniformly distributed ground plane [31]. The inductance of current return path directly adds to the load inductances of the amplifying stages. Hence, in order to make the load resonate at the desired frequency
current return paths should be carefully designed. Transmission lines have better defined current return paths compared to spiral inductors\(^1\). This makes modeling easier, more reliable and more accurate, which increases the chance of first-pass success [3].

In addition, even though inductors may offer space savings over transmission lines, they do not offer the same routing flexibility. Furthermore, as the number of inductors increases, the unwanted mutual coupling between them gets larger, causing unpredicted discrepancies between simulations and measurement results.

Moreover, in contrary to the case for the transmission lines, if an LNA is to be implemented with spiral inductors, several iterations are required to be done due to the lack of a scalable model\(^2\).

Designing in the mm-wave regime makes the choice of transmission lines even more appealing due to the reduced sizing of on-chip passives\(^3\).

Transmission lines play many roles: they transport signals between structures, perform impedance matching, and can be the best means of creating inductive or capacitive elements, especially when lumped components are impractical or too lossy to be fabricated in the semiconductor process due to parasitic lead inductances or poorly defined current return paths [31]. Furthermore, they are usually less sensitive to the modeling inaccuracies compared to their lumped counterparts [33].

Two primary forms of transmission lines used for mm-wave structures include microstrip (MS) transmission lines and coplanar waveguide (CPW). The choice of transmission line type, usually depends on the process and design specifications.

There are no losses due to the substrate on a microstrip line since the ground plane masks it and prevents the penetration of the EM field to the substrate [48]. This is particularly useful in scaled CMOS technologies where a highly doped, low resistivity substrate is often required for dense integration. Microstrip designs offer higher capacitive quality factors (defined as the ratio of electric energy stored to energy lost per cycle) than coplanar lines due to the placement of their ground shield above the lossy substrate. They

\(^1\) The problem of ground current return path is associated to single ended designs and does not exist for a perfect differential circuit.
\(^2\) The Design kits usually include scalable models for inductors at frequencies well below 30GHz. Consequently, the models provided by the design kits are not valid at mm-wave. Even if such models were valid, the parasitic inductances of the routings can be comparable with the “true” inductance at mm-wave, requiring several time consuming EM simulations for accurate design.
\(^3\) Although the area of a typical design with transmission line is still larger than the one with lumped components, at mm-wave the ratio between the areas of the two implementations become closer.
can also be easily routed and usually result in more compact layout compared to CPW\(^1\) [33]. However, the characteristic impedance (\(Z_0\)) of a microstrip line is determined by the effective dielectric constant (\(\varepsilon_{\text{eff}}\)), the line width (\(W\)) and the distance from the ground plane (\(h\)). As technology scales, the separation between the top and the bottom metal layers shrinks. This forces a smaller line width to keep the \(Z_0\) constant and correspondingly leads to higher losses.

Furthermore, the benefit of complete shielding from substrate losses in microstrip transmission line usually comes at the price of an increased resistance in the ground plane; in fact, the lowest metal layer which serves as the ground plane is usually thinner (about five times or more) compared to the higher metal layers, thus causing the current flowing into the ground plane to face a higher resistance, increasing the total attenuation [49].

Coplanar designs render higher inductive quality factors (defined as the ratio of magnetic energy stored to energy lost per cycle) than microstrip designs. They also offer more flexibility. The signal line width (\(W\)) and its distance from the ground planes on the sides (\(G\)) can be independently adjusted to set the characteristic impedance and the series losses of the line. However, reducing the series losses by a wider signal line, typically forces the gap between the signal and the ground lines to be larger than the elevation above the substrate. This implies that the electric and magnetic fields penetrate into the substrate, increasing the shunt losses substantially (See Fig. 3.18).

![Fig. 3.18: Penetration of magnetic field to the substrate in a CPW transmission line [50]](image)

Furthermore, the benefits of a wider signal line are limited by the skin effect at mm-wave frequencies. In CPW, the current tends to crowd around the edges of the signal line, close to the ground planes. Therefore, increasing the width of the signal line beyond a certain

\(^1\) However, when several MS lines are put in close proximity, the unwanted coupling between them may deteriorate the performance and should be taken into account.
point does not reduce the series losses anymore, because the additional metal introduced does not carry significant signal current [51].

The amount of isolation required between the lines can be another factor that influences the selection of the line type. Usually, if higher isolation is required between lines routed in a design, coplanar waveguide transmission lines are preferred since they have ground lines on different sides of the signal line. This effectively shields adjacent signal lines from each other.

Shields below the line (implemented as arrays of short strips below the line) may be added to improve the quality factor of the coplanar lines in a hybrid solution like a grounded coplanar waveguide (GCPW) [31]. However, since the mutual coupling between signal and shield conductors is positive and currents flowing in signal and shield are opposite, a GCPW structure leads to a reduction in total inductance. Moreover, the total capacitance is increased [49].

Emerging transmission line concepts also include elevated coplanar waveguides, where the signal is elevated above the ground lines (i.e. signal line on a top metal layer, with ground lines in a lower metal layer), in order to possibly reduce the insertion loss [31].

Microstrip and coplanar waveguide transmission lines are illustrated in Fig. 3.19.

While designing a transmission line in today’s scaled technologies, minimum/maximum feature density rules imposed by the fabrication process should be taken into account. Typically, a modern CMOS process necessitates a metal density between 20% and 70%. “Dummy” metal fills are often introduced in the layout to meet the target minimum density; but they can easily change the behavior of transmission lines at mm-wave frequencies.
Generally, due to the high-density mesh required to capture all the small details, it is very difficult to predict the influences of the dummies on the transmission line with reasonable (in terms of time and memory) EM simulations. The best approach is to place the dummy metal fills as far as possible from the signal line, so that they have minimal effect on the field distribution. One possible solution for the placement of the dummies is illustrated in figure 3.20 in which dummy metal layers are placed under the ground lines. In the employed transmission line the same approach is followed. Furthermore, in this design the width of signal and ground lines are 12\(\mu\)m and 9\(\mu\)m, respectively and the gap between the signal and ground lines, is 5\(\mu\)m.

**Fig. 3. 20:** Signal, ground and dummy metal placement in the employed CPW line

In our work, the entire LNA design is based on these 50\(\Omega\) coplanar (CPW) transmission lines. The value of \(Z_0\), losses and \(\varepsilon_{\text{eff}}\) of the line are extracted from a prototype fabricated in an earlier run. Since all the matching networks are realized using sections of the line, the modeling of such line is vital to the LNA design.

Due to the frequency domain nature of their modeling, TLine blocks directly imported from ADS are not well suited for time domain analyses (like PSS and transient), which further intensifies the need for a lumped model.

On the other hand, the lengths of the transmission lines have to be adjusted to obtain good input match and flat gain over the large operating bandwidth. Hence, a scalable modeling approach is a must. Consequently, having fixed the coplanar wave-guide line structure, the task is using EM simulations to calibrate a parameterized model suitable for sweep analysis and optimization during the schematic design phase.

Simple RLGC model composed of a resistor in series with an inductor in one branch and a conductance in parallel with a capacitor in another branch, or more complicated structures like the one in Fig. 3.21 can be used to realize such a scalable model.
However, it is possible to cascade more sections in order to further mimic the distributed nature of the transmission line. Experiments in [49] show that, in the case of a working frequency of 60GHz, only 4 sections are required to reproduce the correct behavior of the line for a length of about 500μm (i.e. in all practical cases). The final model of the CPW transmission line used in this design is shown in Fig. 3.22.

The model employs lumped elements that are frequency independent. Nonetheless, it is able to correctly reproduce frequency dependence of the modeled characteristics in an extremely wide band.

Figure 3.23 demonstrates the result of comparison between the model and the measured values for a 300μm piece of the transmission line. Looking at the figures of the characteristic impedance and the attenuation and phase constant of the line it becomes apparent how closely the model follows the measured data. Even if the S-parameters of the line are taken into account there is a very good agreement between the measured and simulated data. From figure 3.23 (f) the amount of signal attenuation for this 300μm transmission line is less than 0.4dB for the maximum frequency of operation.
The proposed LNA

The amplifiers in the signal path not only should have enough bandwidth but also should have minimum variations in their pass band and near constant group delay to avoid distortion in the signal. As a consequence, several inter-stage passive networks are usually added to the structure to enable the control of transfer function and frequency response behavior.

Due to the specifications dictated by the system requirements, the LNA in this work is targeting a gain of more than 25dB in a bandwidth greater than 20GHz. A single ended solution is adopted in this work to implement the low noise amplifier. Furthermore,
transmission lines are used to solve the problem of ground current return path. Considering the large number of inductances required in this design, such a solution provides further immunity against electromagnetic interferences between the inductors.

To obtain an operating bandwidth of more than 20GHz, employing bandwidth enhancement techniques is a must. According to the discussions in the previous sections, depending on the ratio of the load capacitance of the stage and the driving capacitance of the succeeding stage two important observations can be derived:

1) A given bandwidth extension technique may not be optimum for all capacitances ratio.

2) A multi-stage amplifier may achieve superior performance in terms of gain and bandwidth using different bandwidth enhancement techniques between different stages.

Third order matching networks are employed in our design between different amplifying stages to achieve the goal of wide operating bandwidth.

Common source stages serve as the building blocks of our amplifier. To reduce the amount of consumed power, these common source stages are two by two stacked one over the other to realize a low power design in a current re-use architecture. Simulations show that employing current sharing strategy for our circuit reduces the power consumption of the LNA between 25-30% for the same performance and current density for all the stages compared to the case without current re-use.

Consequently, to realize the amplifying stages, we have started with a simple current sharing architecture. The unit cell of a current sharing stage is shown in figure 3.24 (a). A high-frequency ac current into the source of M2 flows to ground by adding the bypass capacitor (shown by “AC Short” at source of M2). This avoids signal interference coupling back to M1.

Next, in order to separate the load capacitance of each stage from the input capacitance of its succeeding stage, inductors L2 and L3 are added to design according to figure 3.24 (b). Finally, with the aim of having more degree of freedom on flattening the gain with in the operating bandwidth, the ac short capacitance are replaced by some smaller capacitance as illustrated in figure 2.24 (c).

---

1 In fact the amount of power saving is not a fixed percentage and varies by changing the bias or supply voltages of cascaded stages.
Fig. 3.24: Simple current re-used stage (a), Separation of load capacitance of each stage from the source capacitance of the next stage (b), increasing the order of inter-stage networks to extend the bandwidth (c)

Two inter-stage matching networks can be observed in Fig. 3.24 (c); one between M_1 and M_2 and the other one between M_2 and the output node. The small signal models of these two networks are illustrated in figure 3.25. As can be seen in this figure, the capacitive parasitic components of MOS devices are absorbed into the matching networks structure.

Fig. 3.25: Small signal model of the matching networks connected to drain of odd (a) and even (b) stages of our current re-used amplifier

Since the capacitors connected to the source of the transistors in the even stages (shown by C_2 in figure 3.24 (c)) are not that much big to be replaced with an ideal short to ground, the small signal model for the matching networks connected to drain node of odd and even stages of the LNA are somehow different.

To figure out the cause of discrepancies between the two networks, we simulated the two cascaded common source stages (both with the same dc operating conditions, but one in a current re-used architecture and the other one in a simple cascaded common source structure (figure 3.26)). If C_2 was very large (an ideal ground at source of M_2), the transfer functions between the input of the first stage and input of the second stage would be similar for both the architectures in figure 3.26. However, if C_2 is replaced with a
smaller capacitor (e.g. 400fF as in our case), the transfer function for the two structures becomes somehow different.

The effect of $C_2$ on the performance of the amplifier can be summarized as follows: Since it is not an AC short, instead of being shunted to ground, inductor $L_2$ at the drain node of $M_1$ sees a resistance of $1/(2g_{m2})$ at its upper node\(^1\); hence, its quality factor is reduced. Furthermore, the capacitance at the source node of $M_2$ generates a negative resistance seen through the gate of this transistor. $C_2$ also generates a notch at $f_{in}$ [52].

$$f_{in} = \frac{1}{2\pi \sqrt{L_{g2} C_{eq}}} \quad (3-28)$$

In which:

$$\frac{1}{C_{eq}} = \frac{1}{C_{gr1}} + \frac{1}{C_2} \quad (3-29)$$

\(^1\) This is the miller equivalent of the $1/g_{m2}$ resistance seen in fig. 3.24.
To verify the validity of the employed model, we added a negative resistance to gate of M2 and a resistance of $1/(2g_{m2})$ in series with L2 in figure 3.26 (b). As can be seen in figure 3.27 the same shape is then obtained for both the amplifying structures of figure 3.26.

![Figure 3.27](image)

**Fig. 3.27**: Voltage gain from the input to the output node of figure 3.26 after modifying the architecture in figure 3.26 (b) to resemble the current re-used architecture of figure 3.26 (a).

Effectively, the presence of $1/2g_{m2}$ resistance at the drain node of the first common source stage, helps in extending the bandwidth by flattening the gain through balancing the quality factors of the LC tanks on the two sides of the matching network. If the drain of M1 were directly connected to the supply through the inductor (as in the case of cascade of two common source stages in Fig. 3.26 (b)) the gain curve would have a transfer characteristic like the one shown by the dashed lines in Fig. 3.28.

![Figure 3.28](image)

**Fig. 3.28**: Comparison of two cascaded stages with current re-used and simple CS structure.
The presence of the $1/2g_{m2}$ resistance at the drain node of $M_1$, not only helps in flattening the gain but also increases the real part of the impedance seen at the gate of this transistor, therefore, improving the stability of the architecture.

Following the above analysis and with the aim of reducing the amount of elaboration in doing the calculations, instead of analyzing two different matching networks, the simpler one (shown in figure 3.25 (b)) is investigated afterward. The effect of $C_2$ at source of even stages will then be taken into account by adding some negative/positive resistance to the simplified matching network of Fig. 3.25 (b).

For the matching network of Fig. 3.25 (b), the transfer function from the input current to the output voltage (the voltage at the gate node of $M_2$) can be calculated as follows:

$$T(s) = \frac{A}{B} \frac{R_2}{s^2R_2C_2L_2 + sL_2 + R_2} \quad (3-30)$$

In which:

$$A = s^3R_2C_2L_1L_2 + s^2(L_1L_2 + R_1R_2C_2L_2) + s(L_1R_2 + L_2R_1) + R_1R_2 \quad (3-31)$$

$$B = s^4R_2C_2C_1L_2 + s^3(L_1L_2C_1 + R_1R_2C_2L_2) + s^2(R_1L_1C_1 + R_2C_2L_1 + R_1C_1L_2 + R_1C_2L_1) + s(L_1 + L_2 + R_1R_1C_1 + R_1R_2C_2) + (R_1 + R_2) \quad (3-32)$$

For the simplified case that $R_1=R_2=R$, $C_1=C_2=C$, $L_1=L_2=L^1$, $A$ and $B$ respectively become as (3-33) and (3-34):

$$A = s^3(RCL^2) + s^2(L^2 + R^2CL) + 2sRL + R \quad (3-33)$$

$$B = s^4C^2L^2 + s^3(L^2C + R^2C^2L) + 4s^2RLC + s(2L + 2R^2C) + 2R \quad (3-34)$$

1 Although this assumption may not be valid for all the cases, this is a common assumption to simplify the derived equations and is called a “balanced condition”.

76
Consequently the transimpedance gain $T(s)$ can be written as (3-35):

$$T(s) = \frac{s^3 (RCL^2) + s^2 (L^2 + R^2 CL) + 2sRL + R}{s^4 C^2 L^2 + s^3 (L^2 C + R^2 C^2 L) + 4s^2 RLC + s(2L + 2R^2 C) + 2R} \cdot \frac{R}{s^2 RCL + sL + R} \quad (3-35)$$

Three zeros and six poles characterize the frequency transfer function of the proposed matching network given by equation (3-35), determining a 60 dB/decade slope beyond the pass-band.

For any circuit, the value of $R$ and $C$ is determined by the input/output impedances of the MOS transistors. Consequently, the transimpedance characteristic can be plotted for different values of $L$ in a software like Matlab© and based on the obtained curves the best value is then selected for the inductances.

Figure 3.29 illustrates the curve that is obtained in Matlab© for the simulated case of $C_1 = C_2 = C = 80\text{fF}$, $L_1 = L_2 = L = 120\text{pH}$ and $R_1 = R_2 = R = 15\Omega$.

One important observation regarding the circuit shown in figure 3.26 (a) is the low impedance, and accordingly the voltage gain that the matching network establishes at the drain node of $M_1$ through the series resonances at gate and source node of $M_2$. In other words, the amplifier in figure 3.26 (a) has low voltage gain from its input to the drain of $M_1$ (The same observation is valid for all the cascaded common source stages).
Consequently, similar to the cascode configuration, the Miller effect on $M_1$ is reduced. Hence, stability of the proposed amplifier is better than a simple common source stage (This will be discussed in more details in the subsequent section). Nonetheless, assuming the quality factor of the inter-stage resonating circuit is greater than one, the voltage gain of the $1^{st}$ stage will eventually be established at the gate node of the second transistor ($M_2$) by the series resonance of $L_{g2} - C_{in2}$.

Another important feature of the employed inter-stage matching network is the current amplifying characteristic that it realizes when employed in the LNA. The current amplifying characteristic of the series inter-stage resonated amplifier can be understood by analyzing the circuit shown in Fig. 3.30. In this figure, $Z_{sub}$ represents the parasitic impedance from the drain node of transistor $M_1$ to ground through the silicon substrate. As a simple model, $Z_{sub}$ is composed of $g_{ds1}$ and the drain junction capacitance $C_{jd}$ of $M_1$ in series with the effective substrate resistance $R_{sub}$. From the small signal model shown in Fig. 3.30 (b), the current gain $i_{d2}/i_{d1}$ can be expressed as (3-36):

$$\frac{i_{d2}}{i_{d1}} = \frac{g_{m2}}{sC_{in2}} \frac{sL_{g2} \| Z_{sub}}{sL_{g1} \| Z_{sub} + sL_{g2} + 1/sC_{in2}}$$

(3-36)

![Fig. 3.30: Cascaded common source stages in a current re-use structure (a) and small signal equivalent circuit used for current amplification analysis](image)

1 It should be noted that due to the large sizes of transistors, there is usually current loss from the output of the first stage to the output of the second stage for cascode configuration.

2 Although $C_1$ and $C_2$ influence the gain shape and the stability of the architecture, for analyzing the current amplifying characteristic, replacing them with a short does not influence the validity of the argument but simplifies the calculations.
From Eq. (3-36), under the condition that $L_{g2}$ resonates with $C_{in2}$, $i_{d2}/i_{d1} = g_{m2}/sC_{in2}$. Consequently, the current gain from drain node of $M_1$ to that of $M_2$ will be proportional to the $\omega_T/\omega$ ratio. Hence, the series inter-stage resonance makes the circuit enclosed by the dashed box in Fig. 3.30 (a) operate like a current gain amplifier. Note that Eq. (3-36) is valid regardless of the size of $M_2$.

The peaking techniques described so far, use passive filter networks to shape the gain response above the original 3-dB frequency. It is noted that if the parameter values in the two-port matching network are chosen properly, as shown in figure 3.31 a smooth gain can be obtained. However, if the networks are designed to introduce a larger peaking and these peakings (equivalently in-band ripple) are then compensated by the succeeding stages, a larger BWER can be achieved.

![Fig. 3. 31: frequency response of the two port load achieving AG-BWER of 4.84 [39]](image)

This is somehow similar to the traditional stagger-tuning approach. When several narrowband amplifiers with different resonant frequencies are cascaded, the resulting multi-stage amplifier has an overall response that is broadband with adequate gain flatness. As can be deduced from figure 3.32 the advantage of this strategy becomes more pronounced if the stagger-tuned stages have inherent wideband transfer characteristics.

Hence, the idea is to separate the transfer characteristics of the two resonating path in each network more than the 0.84x shown in Fig. 3.31. This increases the in-band ripple. In other words, there will be two peaks and a valley in the transfer characteristic of each network. Another inter-stage network designed with different transfer function (Fig. 3.32 (a)) or different operating frequency (Fig. 3.32 (b)) will then compensate this in-band ripple. In these figures, stage 1 and stage 2 resemble the frequency transfer function of each common source stage in the current sharing architecture.
Fig. 3.32: Devised solution to obtain higher gain-bandwidth for two cascaded stages

Taking this observation into account, three of the unit cells shown in figure 3.24 are cascaded to construct the LNA. All the stages employ 50 fingers with 800nm width to realize 40μm devices for the amplifying stages. Width of transistors is chosen as a compromise between power consumption, gain and noise figure.

Typically, larger devices allow for smaller matching networks resulting in reduced losses and higher overall gain for the complete LNA\(^1\) \([53]\). The noise figure also improves\(^2\). On the other hand, a very large device would make the input matching network more challenging, since the 50 \(\Omega\) source impedance would need to be matched to an impedance with a larger capacitance and a smaller real part (much smaller than 50\(\Omega\)). This would translate into narrowband matching and higher sensitivity to modeling and process variations \([51]\). Furthermore, arbitrary increase in the size of the core transistors results in unacceptable power levels. Taking all these trade-offs into account, a width of 40μm is chosen for the transistors.

Due to the limited gain at mm-wave, no explicit inductive source degeneration was used in the initially designed low noise amplifier. However, cascaded common source stages employed in the structure of the amplifier, mandate careful stability simulations to ensure that the architecture remains stable.

The schematic of the proposed LNA is illustrated in Fig. 3.33. All the inductors in this design are implemented by CPW transmission line for accurate control of routings and evading unwanted magnetic couplings between the inductances. Furthermore, T-type input

---

\(^1\) Small devices require large matching networks since their optimum noise impedance is very close to open circuit.

\(^2\) Since the sizes of transistors are increased at constant current density, their \(g_{m}\) increases for a larger device width. This reduces the main contributor to device input referred noise (4kT/\(g_{m}\)) at the gate of the first stage. Furthermore, the smaller is the matching network, the smaller are the losses associated to it. These effects lead to NF improvement for a larger device.
matching network and source inductive degeneration are employed for wideband input matching. The small inductor added to source of M₁ is not used for the stability purposes. As mentioned earlier, the whole structure is stable even in the absence of this inductor. However, it is added to increase the real part of the impedance seen through the gate of the input device to ease realization of an input matching network with a bandwidth more than 30GHz. If the ratio between this impedance and the 50Ω source impedance is very large the matching network will have a large Q and consequently a small bandwidth which contradicts the design requirements.

Increasing the number of reactances in the matching network helps in further broadening the matched bandwidth. However, networks consisting of more than four reactances are rare and above four reactances, the improvement is small [54].

In this design, we have used a simple inductive T network to realize the input match and to cancel out the pad parasitics.

To set the bias point of transistors, it was noted that when biased at current densities between 0.15mA/μm and 0.4mA/μm, minimum noise figure (NF_{min}), \( f_T \), and \( f_{\text{max}} \) of the transistors achieve their optimum performance. They also become almost insensitive to \( I_D \) and \( V_{GS} \) variations [30]. For example, a current density of 0.25mA/μm maximizes \( f_T \) of transistors and a current density of 0.15mA/μm minimizes the noise figure.

In this design the bias current of all the stages is set to 0.15mA/μm to achieve the optimum noise performance. Voinigescu, et. al. in [55] have proven that this is the optimum current density to achieve the minimum noise figure regardless of the architecture or the employed
technology node\textsuperscript{1}. Therefore, the gate biases are adjusted to maintain the desired current density.

At mm-wave frequencies, layout of the devices has a larger impact on the device performance than ever before in lower frequencies. As process scales down, the sidewall capacitance between contacts and the gate poly becomes critical. This effect is more pronounced at mm-wave frequency. Consequently, in doing the layout of transistors, the distance between the gate and drain contact and the gate and source contact has been increased to lower the value of these capacitances and obtain a better MAG.

The layout of the stand alone LNA is shown in figure 3.34. It occupies 1150\(\mu\)m x 730\(\mu\)m including the pads.

**Fig. 3. 34:** Layout of the stand alone LNA

Simulated performance of the designed LNA is summarized in table 3.2. The S-parameters of the LNA together with its noise figure are shown in figure 3.35. The LNA achieves a very flat gain of 34dB in a bandwidth larger than 33GHz. The input match is better than

\[ \text{In fact, it is stated that the optimum current density for } NF_{\text{min}} \text{ is } 0.15\text{mA}/\mu\text{m. Usually the impedance for power match is different from that required for minimum noise figure. Consequently, higher power consumption may improve NF. However, simulation shows that for our case the NF of the amplifier is very close to } NF_{\text{min}}. \text{ Consequently, the value of } 0.15\text{mA}/\mu\text{m will be the optimum value to achieve the best noise performance. It should be noted that although a constant current density biasing scheme is used, larger device sizes still tend to give better NF due to lower losses in the matching network (if the input capacitance associated with them is tolerable).} \]
-10dB from 40GHz up to the end of the operating band. The reverse isolation is better than -100dB for the whole band. In band noise figure is lower than 6.8dB with a minimum of 4.2dB close to 50GHz. The noise curve is also very smooth. This provides another advantage because to amplify the small received radio signals with a good signal to noise ratio, in addition to flat and high forward gain, flat and low noise figure is required. The measured performance of the LNA and further discussion on the obtained values will be presented in chapter five.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gain</td>
<td>33.9dB</td>
</tr>
<tr>
<td>Noise Figure</td>
<td>4.2-6.8dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>34.4-67.7GHz</td>
</tr>
<tr>
<td>Power Consumption</td>
<td>20.6mW</td>
</tr>
<tr>
<td>Input reflection Coefficient</td>
<td>&lt; -10dB (from 40GHz)</td>
</tr>
<tr>
<td>Reverse Isolation</td>
<td>&lt; -100dB</td>
</tr>
<tr>
<td>OP-1dB</td>
<td>+3dBm</td>
</tr>
</tbody>
</table>

During post layout simulations, it was noted that higher order parasitics (the g\textsubscript{ds}, C\textsubscript{gs} and C\textsubscript{gd} of the transistors and shield/substrate to ground capacitances of the MOM capacitors) affect the position of poles and zeros and consequently the gain response of the amplifier. As a result, after choosing the circuit element values, the initial amplifier design was optimized to maximize bandwidth, gain and gain flatness.

![Post layout simulation result of the standalone LNA](image)

**Fig. 3.35:** Post layout simulation result of the standalone LNA
As evident from the gain response of figure 3.35 a large image rejection ratio can be expected due to the steep out of band low and high frequency attenuation provided by the inter-stage networks.

To further illustrate the quality of the input match, figure 3.36 shows the input impedance of the low noise amplifier. As can be seen the resistive part is close to 50 Ohm for more than 30 GHz of the bandwidth while the imaginary part is close to zero Ohm.

![Input impedance of the LNA](image)

**Stability Analysis of the amplifier**

A parameter often used to investigate the stability of circuits is the “Stern stability factor”, defined as (3-37) [16]:

\[
K = \frac{1 + |\Delta|^2 - |S_{11}|^2 - |S_{22}|^2}{2|S_{21}||S_{12}|}
\]  

(3-37)

Where \(\Delta = S_{11}S_{22} - S_{12}S_{21}\). If \(K>1\) and \(\Delta<1\), then the circuit is unconditionally stable (i.e. it does not oscillate for any combination of source and load impedances). In modern RF design, the load impedance of the LNA is relatively well-controlled, making \(K\) a pessimistic measure of stability.

Furthermore, it is well known that if a LNA is stable, the real part of the impedance seen through the input of each of its constituent stages is positive [56]. Consequently, to analyze the stability of the proposed amplifier, its input impedance can be calculated.
The small signal model shown in Fig. 3.25 (a) can be used to calculate the impedance at drain node of M₁. In other words, after defining the parameter values for the elements of the inter-stage matching network of figure 3.24 and prior to schematic level implementation, this circuit can be simulated to check whether this structure is stable or not.

There exist two general observations that can help creating a stable design: It is noted that for all the frequencies at which the imaginary part of the drain load is capacitive, \( \text{Real}\{Z_{\text{in}}\} > 0 \) and the structure is stable. For frequencies at which the imaginary part of the drain load is inductive, we can obtain a critical quality factor for the load below which the structure is stable. In other words, for inductive loads the stability of the architecture depends on the resistance in series with that inductor.

Following the above procedure, after devising the parameter values for the constituent elements of the matching network, the impedance can be plotted versus frequency using Cadence, Mathematica or Matlab. For the problematic case of an inductive load a critical Q can be derived to guarantee the stability of the architecture. If required, some resistances can then be added in parallel to the tank to obtain the required quality factor.

In order to determine the stability boundary (the critical Q) another experiment should be done. The configuration illustrated in figure 3.38 is used for this purpose. Different loads (inductive, capacitive) were examined to investigate the stability criteria of the amplifier.

![Fig. 3.37: Simplified view of one common source stage with its load](image)

The important consideration regarding an inductive load is to take into account the presence of the parasitic capacitance at drain node of the common source stage M₁ (shown by \( C_d \) in Fig. 3.39). Consequently for an inductive load of \( Z_{\text{load}} = L \) the effective inductance at drain node of M₁, that should be used in the calculation of the critical Q, is given by (3-38):
\[ Z_{\text{load-effective}} = \frac{j\omega L}{1 - \omega^2 LC_d} \]  \hspace{1cm} (3-38)

With the dimensions of our circuit, the critical quality factor obtained from the above experiment for an inductive load is 0.25. Hence, defining the quality factor of the network in figure 3.38 as (3-39), the amount of resistance that should be added to the tank to arrive at a stable solution can then be determined.

\[ Q = \frac{\text{Imag} \{ Z_T \}}{\text{Real} \{ Z_T \}} \]  \hspace{1cm} (3-39)

This justifies why the value of the capacitor at source node of even common source stages should not be very large. The value of \( C_2, C_5 \) and \( C_8 \) not only affect the gain flatness but also the stability of the architecture and cannot be arbitrarily large.

As can be seen from Fig. 3.39, an optimized value exists for the bypass capacitor \( C_2 \) in this topology, to ensure stability and to allow wideband operation with good gain flatness.

In addition to the above analyses, during post layout simulations, stability analyses were also done in Cadence to make sure that the structure remains stable for all the frequencies. Fig. 3.40 shows the simulated \( K \) and \( \Delta \) for the presented LNA.
Chapter Summary

This chapter was dedicated to design and implementation of a wideband, high gain single ended low noise amplifier. A modified version of this amplifier (introduced in chapter 4) will then be integrated in the receiver chain of our 50GHz data link.

Following the topic of bandwidth enhancement techniques, several wideband high gain low noise amplifiers were introduced from state of the art. Different possibilities to implement the LNA were investigated and pros and cons of each solution were discussed. The architecture of the LNA was introduced then.

The LNA is composed of three current sharing stages to reduce power consumption. In each of the current sharing stages, the load capacitance of each MOS transistor is separated from the input capacitance of its following stage by adding some inductances between the two. Furthermore, the order of inter-stage networks is increased to obtain higher number of poles and zeros and consequently increasing the overall operating bandwidth of the stage. In addition, wideband stagger tuning technique is also employed among the inter-stage networks to further extend the bandwidth. Through tailored combination of the aforementioned techniques the LNA achieves the highest gain-bandwidth product reported so far for a mm-wave low noise amplifier.
Stability simulations and analyses were done to make sure that the designed amplifier remains stable for all the frequencies of operation and for different source and load impedances.

Post layout simulations of LNA demonstrates a noise figure of less than 6.8dB, a gain of more than 33dB over 33GHz of bandwidth corresponding to a GBW of 1645GHz and a power consumption of 20.6mW. The LNA is fabricated in bulk CMOS 28nm Technology node by STMicroelectronics.
Chapter 4

Transceiver Building Blocks

Introduction

After introducing the adopted solution for implementing the short range mm-wave transceiver in chapter two and following the chapter on the design and analysis of wideband low noise amplifiers, in this chapter, building blocks of the proposed transceiver are discussed in more details.

As mentioned in chapter two, on-off keying modulation with non-coherent detection is employed in the transceiver system to simplify its implementation. OOK modulation not only provides the benefit of a much simpler design but also reduces the area and power consumption requirements. Furthermore, the large available bandwidth at mm-wave frequencies resolves the spectral efficiency issue related to this modulation scheme.

For the transmitter of such a system to work properly, three different tasks are required to be fulfilled. First a voltage controlled oscillator should generate the required 50GHz carrier. Then, the generated signal should be modulated by the input data and finally, it should be amplified to an acceptable level to be transmitted over the communication channel. In our design the latter two tasks are done by the power amplifier which simultaneously serves as the modulator and the amplifier (Fig. 4.1).
Consequently, in exploring transmitter building blocks, we start with the 50GHz VCO. A short discussion on the design and performance of the modulating power amplifier then follows the argument on the VCO.

Thanks to the adopted demodulation scheme the architecture of receiver is relatively simple and is composed of few circuit blocks, namely: low noise amplifier, envelope detector, and a chain of limiting amplifiers (Fig. 4.2).

A short discussion on the design of the single ended input differential output low noise amplifier that is employed in this design will start the section on the receiver building blocks.

In contrary to conventional RF and mm-wave receivers in which the low noise amplifier is followed by a down conversion mixer, in this work the output of the LNA is fed to an envelope detector to demodulate the received signal. The recovered data is then amplified up to rail to rail by the subsequent limiting amplifier stages.

In the following sections design of all the aforementioned blocks will be discussed in more details.

**Voltage Controlled Oscillator**

Voltage controlled oscillators are among the most discussed circuits of nanometer CMOS design as they remain the bottleneck of high frequency mm-wave transceivers. Signal generation at these frequencies is a challenge in solid-state electronics due to the limited cut-off frequency and breakdown voltage of active devices as well as the low quality factor of passive components caused by Ohmic and substrate losses. Traditionally, compound semiconductors were used to implement fundamental oscillators at mm-wave frequencies. However CMOS transistors are nowadays employed to generate signals in the same frequency range using fundamental and push-push oscillators [57].
In order to design a VCO different parameters such as phase noise, tuning range, output swing and power consumption should be taken into account. Direct trade-offs among these parameters makes the design of voltage controlled oscillators challenging. The trade-offs are more severe as the frequency of oscillation increases.

Although realization of mm-wave VCOs has been possible in CMOS technology for some years, benefits -in terms of FOM- cannot be expected from technology scaling. In particular, the higher quality factor of varactors at nodes with smaller minimum feature size is at the cost of reduced tuning range. Furthermore, although power saving is achieved in front-end high frequency blocks -because of increased maximum oscillation frequency and maximum available gain- the output voltage swing decreases due to the limited supply voltage. This makes achieving a low phase noise VCO powered from a low supply voltage extremely challenging [58]. Furthermore, the limited voltage headroom further limits the amount of $C_{\text{var-max}}/C_{\text{var-min}}$ ratio because the varactors cannot be used within their whole range of variation.

To utilize the whole tuning range of varactors top biased VCO can be used. However, when entering triode region each transistor in top biased architecture provides a direct resistive path to ground. Since the center tap of the tank inductor is also at ac ground, the tank Q heavily deteriorates. Thus, the top biased topology suffers severely if its core transistors enter deep triode region. Furthermore, due to the modulation of the output common mode level (and hence the varactors) by the noise current of $I_{\text{bias}}$ this topology suffers from a high phase noise.

Another possibility to implement the VCO is employing tail biased topology. However, since the common mode voltage at the tank is equal to $V_{\text{DD}}$ the tail biased topology has a limited tuning range. It is because the capacitance range corresponding to negative $V_{\text{GS}}$ (for $V_{\text{cont}} > V_{\text{DD}}$) remains unused.

To overcome aforementioned issues, an NMOS-PMOS cross coupled pair is employed to construct the VCO. In this architecture the bias current is re-used by the PMOS devices, providing a higher transconductance and leading to faster switching of the cross coupled differential pair. But a greater benefit of this topology is the doubled voltage swing that it provides for a given bias current and inductor design. This is because the current in each branch swings between $+I_{\text{ss}}$ and $-I_{\text{ss}}$ in this architecture while in the NMOS/PMOS only topologies it swings between $I_{\text{ss}}$ and zero.
Furthermore, since the voltage at the common mode of the tank can be set to $V_{DD}/2$, the whole tuning range of the varactor can be used.

A simplified view of the traditional NMOS-PMOS cross coupled VCO is illustrated in figure 4.3. Due to the small available headroom for the bias current ($I_{SS}$), its noise current given by $4kT \gamma g_m$ tends to be large. This noise current modulates the output common mode level and accordingly the capacitance of the varactors, generating frequency and phase noise. A change of $\Delta I$ in $I_{SS}$ results in a change of $\Delta I / 2g_{m1,4}$ in the voltage across each varactor and consequently a frequency change $K_{VCO}$ times larger.

Moreover, taking into account the available 1V supply and the threshold voltages of transistors, it is impossible to stack three transistors on top of each other in the employed 28nm technology node.

To solve these problems, the bias current source is removed in our design. We have then separated the DC voltages of the gate of the NMOS core transistors from the voltages at their drains through a decoupling capacitor. As demonstrated in figure 4.4, a tunable voltage source (prepared by a DAC) provides the bias of the NMOS pair.

Removing the bias current source increases the supply sensitivity of the oscillator\(^1\). However it leads to higher output swing because it obviates the voltage headroom required for the current source. Furthermore, this current source is one of the basic contributors of

---

\(^1\) Supply Sensitivity is a measure that illustrates the dependence of the oscillation frequency of an oscillator to the variations in the supply voltage. Such dependence translates supply noise to frequency (and phase) noise.
flicker noise ($1/f^3$); hence, additional circuitries are commonly required to reduce its effect [58]. The absence of the current source eliminates the need for such circuits and leads to further simplification of the designed architecture. Furthermore, employing tunable control voltage helps to partially compensate variation of the bias current due to Process, Voltage, Temperature (PVT) variations.

An accumulation MOS (AMOS) varactor is used for frequency tuning. Due to its superior performance in terms of quality factor and tuning range, thin oxide varactor is selected over the thick oxide to realize the tank tunable capacitance. To achieve maximum quality factor, minimum length -which is twice the minimum feature size- has been chosen for the varactor while a 1μm width compromises tuning range and quality factor.

To cover the positive and negative ranges of the varactor symmetrically, the core transistors are sized properly so that the common mode voltage at the tank equals $V_{DD}/2$.

Aggressively scaled CMOS technologies suffer from high process tolerances. To have a reasonable margin to compensate frequency variation due to PVT, 10% tuning range was presumed for the 50GHz oscillator. This tuning range, along with the center frequency, and the inductor set the value for the minimum and maximum tank capacitance [48]:

---

93
\[ \Delta C \geq 4 \frac{(f_{\text{max}} - f_{\text{min}})}{L \pi^2 (f_{\text{max}} + f_{\text{min}})^3} \]  

(4-2)

Assuming 100pH for the tank inductor (as a reasonable number), the aforementioned values for the center frequency and the tuning range require a \( \Delta C \) of 200fF. At mm-wave frequencies the quality factor of varactors dominates the quality factor of the tank. Since the larger is the value of the varactor, the lower is its quality factor, such a large varactor is avoided in this design. Instead, a relatively small varactor (with the nominal value of 22fF) has been chosen for the tank to obtain the maximum possible Q. We have then introduced a capacitive bank that switches fixed MOM capacitors in and out the tank.

By employing discrete tuning for the VCO the gain characteristic of the VCO \( (K_{\text{VCO}}) \) becomes relatively linear and does not significantly change across the tuning range.

One important issue with discrete tuning is the “on” resistance, \( R_{\text{on}} \) of switches that control the unit capacitor. This resistance degrades the quality factor of the tank. As illustrated in figure 4.5 to lower the effect of this issue the switch \( (S_1) \) is put between the two MOM capacitors such that with differential switching at these nodes only half of \( R_{\text{on}} \) appears in series with each unit capacitor. This allows a twofold reduction in switch width for a given resistance [16].

![Fig. 4.5: Implementation of the capacitive bank](image)

In this work, to avoid the large phase noise contributed by the biasing switches \( S_2 \) & \( S_3 \), we have provided the bias to drain and source’s of \( S_1 \) through two large resistors (figure 4.6). The resistor biased version exhibits an enhanced Q; thus, a better trade-off between tuning range and phase noise can be achieved. The value of the resistors is chosen to be

\[ K_{\text{VCO}} \] is the slope of the frequency versus voltage characteristic of the VCO and is specified in rad/s/V.
large to avoid loading the tank. Noise simulations verify the effectiveness of such a biasing scheme in reducing the phase noise.

![Diagram](image)

**Fig. 4.6:** Replacing the biasing switches with resistors

As the switches in the capacitive bank turn on, the loss of the parallel branches degrades the quality factor of the tank. Consequently, larger $g_m$ is required to start and maintain the oscillation. Increasing the bias voltage of the NMOS core transistors helps providing the required larger $g_m$. This results in power saving because while a higher current is used at lower oscillating frequencies -where it is needed due to the larger tank capacitance (and consequently losses)- a smaller current can be used at higher frequencies. This way, the power consumption of the circuit is managed without significant degradation in the phase noise performance.

Table 4.1 shows the design parameters of the 50GHz VCO. In realizing the core transistors, both NFET and PFET gate pitches are stretched to enhance carrier mobility and to reduce parasitic capacitances.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Tank Inductor</td>
<td>60pH</td>
</tr>
<tr>
<td>Varactor</td>
<td>22fF</td>
</tr>
<tr>
<td>Core Transistors</td>
<td>25um</td>
</tr>
<tr>
<td>Switched Capacitor</td>
<td>$4^* \times 2^{**} \times 26fF$</td>
</tr>
<tr>
<td>Decoupling Capacitor</td>
<td>100fF</td>
</tr>
</tbody>
</table>

* Number of rows in the capacitive bank

** Number of series capacitors in each branch

Symmetric spiral inductors excited by differential waveforms exhibit a higher quality factor than their single-ended counterparts. As a consequence, a symmetric differential inductor is used in this design to realize the tank inductance. The two top most metal
layers placed in parallel are employed to lower the series resistance of the single turn spiral, the outer diameter of which is 30μm.

Modern fabrication processes require the most possible uniform metal density. In contrary, on-chip inductors are often made of only few metal layers, resulting in low metal density in a large area. One potential possibility to compensate this defect is to utilize the area underneath the inductor to put part of the circuitries to increase the metal density. Conventionally, due to the concern that components under the inductor would degrade its quality factor through eddy current loss, the real estate underneath the inductor was not utilized. However, if the size of the devices placed inside and/or around the inductor is small, the induced eddy current loops are localized in small regions which keeps the losses to a minimum [17].

Accordingly, we have used part of the area underneath the spiral to put the core circuit and to layout the digital bias control circuitry, reducing the area to just 120μm x 60μm (figure 4.7).

![Fig. 4.7: Layout of the 50GHz VCO](image)

In designing layout of the VCO, the series resistance of $V_{\text{tune}}$ line has been minimized, because it adds thermal noise which directly converts into phase noise [17].

An important practical concern at mm-wave frequencies is the routing parasitics, which can result in large discrepancies between simulation and measurement results [59]. To avoid such deviations, parasitics of interconnects were carefully extracted through EM simulations and the model illustrated in figure 4.8 was employed for circuit simulations.
The VCO is simulated using Cadence Spectre with a 1.0V supply voltage. The two outputs are connected to the driver stage of the PA through two decoupling capacitors. The total current flowing through the VCO is less than 5mA under all operating conditions.

Simulation results of the tuning range of the VCO versus the bias voltage variations are shown in Fig 4.9. As expected, a conventional switched capacitor bank would result in non-uniform frequency steps that widen at higher frequencies. The VCO can be tuned from 47.4 to 52.7GHz. To avoid blind zones in discrete tuning, each two consecutive tuning characteristic have some overlap.

As shown in figure 4.10, for the center frequency of the VCO the phase noise at 10MHz offset is -115.3dBc/Hz. The phase noise remains less than -113dBc/Hz for all the frequencies within the tuning range of the VCO.
Table 4.2 compares the performance of the VCO with some of the state of the art.

**Table 4.2:** Comparison of the designed VCO’s performance with some of state of the art

<table>
<thead>
<tr>
<th>Ref</th>
<th>Technology (CMOS)</th>
<th>Center Frequency (GHz)</th>
<th>Tuning Range (%)</th>
<th>Phase noise @ 10 MHz offset (dBc/Hz)</th>
<th>DC Power (mW)</th>
<th>FOM*</th>
</tr>
</thead>
<tbody>
<tr>
<td>[59]</td>
<td>65nm</td>
<td>56.0</td>
<td>17.0</td>
<td>-119.0</td>
<td>15</td>
<td>182.2</td>
</tr>
<tr>
<td>[48]</td>
<td>65nm</td>
<td>47.5</td>
<td>22.9</td>
<td>-119.3**</td>
<td>16</td>
<td>179.5</td>
</tr>
<tr>
<td>[60]</td>
<td>130nm</td>
<td>50.3</td>
<td>6.8</td>
<td>-127.8</td>
<td>35</td>
<td>186.4</td>
</tr>
<tr>
<td>[61]</td>
<td>65nm</td>
<td>54.0</td>
<td>11.5</td>
<td>-118.0</td>
<td>7.2</td>
<td>184.0</td>
</tr>
<tr>
<td>This Work</td>
<td>28nm</td>
<td>50.0</td>
<td>10.5</td>
<td>-116.1</td>
<td>3.9</td>
<td>184.1</td>
</tr>
</tbody>
</table>

* $FOM = PN(\Delta f) - 20\log \left( \frac{f_0}{\Delta f} \right) + 10\log \left( \frac{P_{DC}}{1mW} \right)$

** Extrapolated from 1 MHz offset

98
**Power Amplifier**

The output of the VCO should be modulated by the input data and be amplified prior to being transmitted over the 50GHz channel. These tasks are fulfilled by the power amplifier stage.

Generally, while delivering adequate output power with a high efficiency for long battery life, PAs must be designed with adequate linearity for the specific modulation scheme that is employed. These goals are even more challenging considering the low supply voltages of modern deep sub micrometer CMOS technologies and the large dynamic ranges required. Reduced supply voltage also limits the output voltage swing of the amplifier, putting an upper bound on the gain [31].

Most stand-alone PAs have been designed as a cascade of single ended stages. There are two main reasons accounting for this choice: the antenna is typically single ended and single ended RF circuits are much simpler to test than their differential counterparts [16]. Nevertheless, single-ended PAs, suffer from two drawbacks. They waste half of the transmitter voltage gain because they sense only one output of the up-converter (in our case the VCO) and they pull very large transient currents from the supply to the ground causing ripple in the frequency response or instability through the supply and/or ground bond wire inductances [16]. As a result, a differential PA is used in our design to ameliorate these issues.

On the other hand, CMOS technology is famous for its fairly high parasitic capacitances. When both the input and output are tuned, these parasitic capacitances including the gate-drain capacitance which connects the output to the input of the transistor make the transistor unstable. One way to alleviate this problem is to employ neutralization. Neutralization is a fairly wideband technique that makes the transistor unconditionally stable over a wide frequency range. It allows the designer to easily match the input and output network for maximum gain, output power or efficiency without compromising the stability of the circuit. Neutralization requires a 180-degrees shifted voltage, which is provided thanks to the employed differential approach [62].
Our power amplifier uses a neutralized common source structure as illustrated in figure 4.11. On-off keying is achieved by modulating the gate voltage of the tail transistor of the PA\(^1\). The tail transistor is a large MOS operating in deep triode region.

Switching the PA is advantageous in terms of power saving for the PA and in terms of phase noise for the VCO\(^2\).

As shown in figure 4.11, a transformer is used to connect the output of the PA to the antenna. It also provides a path to connect the supply voltage to the PA.

Figure 4.12 illustrates the gain, PAE and output power of the PA versus the input power. The peak PAE is 24.8%. The gain and output power at the peak PAE are 11.8dB and 13.7dBm, respectively. The input 1-dB compression point of the PA is -6dBm.

As stated earlier, to deliver high voltage and high current to the antenna, a transformer is employed between the PA and the fixed load of the antenna. As the supply voltages

---

\(^1\) Since the on-off switching function of the PA (realized by the tail modulating transistor) is a single-ended behavior, our differential PA still partially suffers from the issues related to single ended structures.

\(^2\) Although small, the \(V_{DS}\) of the bottom transistor lowers the output swing of the VCO leading to reduced output power and increased phase noise. Moreover, the switch transistor operates in triode region, which further degrades the phase noise of the oscillator.
shrink, higher impedance transformation ratio is required. This transformation is never without power loss. Typically the higher is the impedance transformation ratio, the higher is the amount of power loss, resulting in low power efficiency. Consequently a 1:1 transformation ratio is employed in this design.

The use of a 1:1 transformer rather than lumped LC networks, results in higher power efficiency, mainly because the energy stored in the transformation network is lower compared to the lumped LC case [62].

The secondary of the transformer is connected to the transmit antenna through bond wires, which present the same characteristic impedance to the antenna (i.e. The bond wires serve as matched transmission lines).

![Gain, Pout & PAE of the PA vs. input power](image)

**Fig. 4.12:** Gain, Pout & PAE of the PA vs. input power

Since the common source transistors of the PA are chosen to be wide to carry a large current, their input capacitance is very large, presenting substantial load to the VCO and making its design challenging. This issue is dealt by interposing a driver stage between the VCO and the PA at the cost of lower compression point and higher consumed power. The schematic of the driving stage is demonstrated in figure 4.13.

The driver need not be as wideband as the PA because it only needs to cope with the 5GHz bandwidth required to accommodate the output signal of the VCO\(^1\).

A variable resistor is used between \(V_{DD}\) and the primary inductors of the driver’s transformer to allow variable driver gain by changing its supply voltage.

\(^1\) This is a consequence of the 10% tuning range of the 50GHz VCO.
Current sources are not used neither in the oscillator nor the power amplifier to ensure that the full supply voltage is dropped across the active devices and to further increase the output power and efficiency.

Consuming 47.5mW of power, the power amplifier delivers 14.4dBm of power to the antenna with a power added efficiency of 27.8%. The small signal gain of the PA is 20dB and its 3-dB bandwidth is 14GHz. Design parameter values of the PA are summarized in table 4.3.

Table 4.3. Design parameters value for the PA and its driver

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Driver</strong></td>
<td></td>
</tr>
<tr>
<td>M₁-M₂</td>
<td>20μm</td>
</tr>
<tr>
<td>M₃-M₄</td>
<td>14μm</td>
</tr>
<tr>
<td><strong>PA</strong></td>
<td></td>
</tr>
<tr>
<td>M₁-M₂</td>
<td>200μm</td>
</tr>
<tr>
<td>M₃-M₄</td>
<td>125μm</td>
</tr>
<tr>
<td>M₅</td>
<td>400μm</td>
</tr>
</tbody>
</table>
Single ended input differential output Low noise amplifier

The core of the LNA integrated in the RX chain is similar to the standalone LNA described in chapter 3. However, some modifications are necessary to make the LNA suitable for the receiver.

The first difference between the two designs arises from the input matching network. The single ended LNA is designed to be connected to a source impedance of $50\Omega$. However, the impedance of the on chip antenna for the receiver is $110\Omega$. Consequently, the input matching network of the LNA should be changed to match the characteristic impedance of the antenna.

The output matching network of the LNA integrated in the RX is also different from that mentioned in chapter three. While the output of the standalone LNA is connected to the $50\Omega$ impedance of the measurement instrument through an open drain buffer, the output of the receiver’s LNA should be connected to the envelope detector.

In order to feed the differential inputs of the envelope detector, the output of the LNA should be converted to a differential signal. Such single ended to differential conversion is done through a balun. The addition of the balun will change the impedance seen by the LNA at its output. As a consequence, the output matching network should also be modified.

The RX low noise amplifier is composed of three cascaded current re-use stages plus an output common source stage added to compensate the losses of the balun. The added output transistor lowers the loading to the last current re-use stage too. All the stages except the last one employ 50 fingers with 800nm width to realize $40\mu m$ devices for the common source stages. The size of the last stage is halved to obtain a flat gain for the LNA. In fact, since the inductive load at the drain node of the last common source stage (i.e. the impedance seen through the balun) is large, a smaller transistor is used to lower the parasitic capacitances, so that the transfer function of the last stage achieves a relatively similar response to the other stages.

The output common source stage provides 6dB additional gain -at the cost of more power-to compensate the almost 6dB losses contributed by the balun. Furthermore, since it separates the balun from the inductive load of the last current re-use stage, it allows keeping the same circuitry for the core. If this stage was not added, we would be forced to reduce the size of last stage’s transistor to cope with the higher inductance of the balun.
Furthermore, this would have made the gain of the new design, 6dB lower than the standalone LNA due to the losses associated to the balun.

Active baluns have smaller sizes and could also be used in this design, but their performance including noise, linearity and balance property are usually not satisfactory and they dictate further complication to the design. Hence, a passive solution was adopted to be more reliable and straightforward. The balun is realized by two one-turn spiral inductors put on top of each other and is made of the two top most metal layers to obtain the maximum quality factor.

Fig. 4.14 illustrates the schematic of the RX LNA. Similar to the standalone LNA, the small degeneration inductor that is added to the source of $M_1$ is to increase the real part of the impedance seen through the gate of this input device to ease realization of a very wideband input matching network.

The gate biases of all the current re-use stages are adjusted to have a bias current of 0.15mA/μm for optimum noise performance.

Two separate pads are used to provide the supply and bias for the last amplifying stage. This will present the opportunity to realize a variable gain for the LNA.

![Fig. 4.14: Schematic of the LNA integrated in the RX](image)

The layout of the RX LNA is shown in figure 4.15. The LNA occupies 1260μm x 770μm including the pads and the output balun.
Performance of the single ended input, differential output LNA is summarized in table 4.4. The increased power consumption compared to the single ended version is related to the additional common source stage used in this design.

**Table 4.4:** Simulated performance of the LNA employed in the RX

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Differential Gain</td>
<td>34.1dB</td>
</tr>
<tr>
<td>Noise Figure</td>
<td>6.4-8.6dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>37.3-66.4GHz</td>
</tr>
<tr>
<td>Power Consumption</td>
<td>24.6mW</td>
</tr>
<tr>
<td>Input reflection Coefficient</td>
<td>&lt; -10dB</td>
</tr>
<tr>
<td>Reverse Isolation</td>
<td>&lt; -80dB</td>
</tr>
</tbody>
</table>

The higher noise figure in this design can be associated partly to the new input matching elements and partly to losses of the GSSG pads routings\(^1\).

LNA’s S-parameters together with its noise figure are shown in figure 4.16. It achieves a flat differential gain of 34.1dB in a bandwidth around 30GHz. The input match is better than -10dB for the whole operating range. The reverse isolation is better than -80dB. In

\(^1\) The source impedance is also higher for RX LNA compared to the standalone one, which increases the NF.
The band noise figure is lower than 8.6dB with a minimum of 6.4dB at the center frequency. The noise curve is also smooth. Considering the flat gain response of the LNA, this is another advantage to achieve a good and relatively constant signal to noise ratio within the whole operating bandwidth of the receiver.

![Post layout simulation results of the LNA integrated in the RX chain](image)

**Fig. 4. 16:** Post layout simulation results of the LNA integrated in the RX chain

### Baseband Circuits

The block diagram of the baseband section of our transceiver system is demonstrated in figure 4.17. It is composed of an envelope detector, a chain of limiting amplifiers together with a two-stage output buffer and a feedback amplifier which serves to cancel the offset of the amplifying chain. A differential structure is employed for the baseband to achieve higher power supply rejection ratio (PSRR) and higher common mode rejection ratio (CMRR). A dummy envelope detector provides the second input for the first differential baseband amplifier.

![Block diagram of the baseband section](image)

**Fig. 4. 17:** Block diagram of the baseband section
Looking at the formula for the power of a periodic signal, given by Eq. (4-3) two functions are required to implement power detection at the receiver: First, squaring and second integration. The later is implemented through a low pass RC filter connected to the output of the envelope detector and the intrinsic square law I-V characteristic of MOS transistors can be employed to implement the former.

\[ P = \frac{1}{T} \int_{t_0}^{T} x(t)^2 dt \]  \hspace{1cm} (4-3)

A simplified common source MOS envelope detector is depicted in Fig. 4.18. The output swing \( A_{out} \) for this circuit is given by [1]¹:

\[ A_{out} = \frac{1}{4} \mu n C_{ox} (W / L)_{1,2} R A_{in}^2 \]  \hspace{1cm} (4-4)

The output is proportional to the square of the input which increases its magnitude significantly. Furthermore, the undesired 50GHz ripple is filtered out by the RC network at the output of the envelope detector while the recovered data passes through it.

![Fig. 4.18: Simplified schematic of a common source envelope detector](image)

A common source structure provides wide operating bandwidth but low gain. To increase the gain of the ED cascode architecture can be used at the cost of reduced bandwidth.

¹ This equation is valid only if both the input devices remain in saturation all the time.
According to figure 4.19 two possibilities exist to implement the cascode envelope detector. Simulation results show that for a low power design the architecture of figure 4.19 (a) achieves higher gain for the same load and power consumption compared to the architecture in figure 4.19 (b).

We have also arbitrarily changed the load resistor and power consumption of the two architectures to achieve the same gain for both the circuits. It was seen that for the same gain the architecture of Fig. 4.19 (a) achieve wider bandwidth while consuming lower power. Performance comparison of the two structures is summarized in Table 4.5. Based on the simulation results, the circuit shown in figure 4.19 (a) is selected to implement the ED.

The power consumption of the envelope detector is very small (around 1% of the overall receiver). Consequently, regardless of power, different sizes and bias conditions were tested for the core transistors to optimize the design for gain and bandwidth. Based on the simulation results, 40μm common source devices were selected for the input stage, while the size of the cascode device is selected to be 16μm. This is the optimum size for the
A cascode device to present low parasitic capacitance to the output node to achieve a wide bandwidth, and a relatively large resistance in parallel with the output load to prevent drop in gain.

Table 4.5: Performance comparison of the two architectures shown in Fig. 4.19.

<table>
<thead>
<tr>
<th>Selected Architecture</th>
<th>Gain (dB)</th>
<th>Bandwidth (GHz)</th>
<th>Power Consumption (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.19 (a)</td>
<td>12.0</td>
<td>4.80</td>
<td>0.6</td>
</tr>
<tr>
<td>4.19 (b)</td>
<td>12.0</td>
<td>3.95</td>
<td>1.0</td>
</tr>
</tbody>
</table>

A single ended ED provides superior performances in terms of gain and bandwidth. It has also the advantage of being directly connected to the LNA, obviating the need for a single ended to differential (Balun) circuit. However, simulations show a poor PSRR for the single ended envelope detector, raising serious stability issues for the whole baseband chain. As a consequence, a differential solution is chosen to avoid any stability issue.

The complete structure of the envelope detector used in this design is shown in figure 4.20. It provides tunable differential gain in a bandwidth of more than 15GHz. A cascode current mirror is employed to set the bias voltages of both the envelope detector and the dummy branch.

The core transistors of the envelope detector operate in sub-threshold region for higher second harmonic power. Squaring of the input signal is achieved based on the second order coefficients in the expansion of I-V characteristic of MOS devices in sub-threshold. The differential currents are then summed up in the load of the push-push detector.
Consequently, while the baseband signal combines additively -because it is squared- the 50GHz signal is cancelled, obviating the need for a large carrier suppression filter. Nevertheless, in our work, because of its limited bandwidth, the limiting amplifier chain serves also as a suppression filter helping in further removal of any unwanted higher frequency signal.

When a 1 is received the current through the active PMOS load ($M_4$) is increased. Consequently, the output voltage level drops. This way, the level difference at the output of the envelope detector, distinguishes between ones and zeros and the received data is demodulated.

Analytically, the principal of operation of the ED is as follows. Supposing that the transistors are biased in the sub-threshold region, the drain currents of the real and the dummy branches can be expressed by the following equations, respectively:

\[
I_{D1} = I_0 \exp(V_g + V_{in} - V_{th} / \zeta V_T) \tag{4-5}
\]

\[
I_{D2} = I_0 \exp(V_g - V_{in} - V_{th} / \zeta V_T) \tag{4-6}
\]

\[
I_{D5} = I_{D6} = I_0 \exp(V_g - V_{th} / \zeta V_T) \tag{4-7}
\]

where $V_g$, $V_{th}$ and $V_{in}$ represent the gate voltage, threshold voltage and the input voltage, respectively. The dimensions of these four transistors are the same. Using the $e^x$ expansion, the differential output voltage can be approximated by equation (4-8).

\[
V_{out} = R_L (I_{D1} + I_{D2}) - R_L (I_{D5} + I_{D6}) \approx 2R_L I_0 \frac{V_{in}^2}{\zeta V_T} \tag{4-8}
\]

where $R_L$ is the load resistance of the detector, determined by the PMOS active load. Substituting $V_{in} = A(t) \cos\omega t$ in (4-8), the differential output voltage of the envelope detector will be given by:
\[ V_{out} = \frac{2R_I I_0}{\zeta V_T} A(t)^2 \cos^2 \omega_0 t \]  

(4-9) 

Which can be re-written as:

\[ V_{out} = \frac{2R_I I_0}{\zeta V_T} A(t)^2 (\cos 2\omega_0 t + 1) \]  

(4-10) 

The high frequency component (Cos2\omega_0t) will be suppressed then by the low-pass filter at the output of the detector to remain just the DC portion. 

PMOS active loads (M4-M8) are employed to trade the bandwidth of the envelope detector for the gain. By changing the gate bias of the PMOS load the gain of envelope detector varies from -1 to 10dB and its bandwidth changes from 19.0 to 5.0GHz. Consequently, as can be seen from Fig. 4.21, thanks to the PMOS active load the ED simultaneously serves as a normal envelope detector and an equalizer.

**Fig. 4.21:** Variable gain of the ED obtained by changing the bias voltage of the PMOS active load for an 100mV input signal

---

1 Since on-off keying is employed as the modulation scheme, the demodulator should detect between ones and zeros. Consequently, for the particular case of OOK modulation \( A(t) \) and \( A^2(t) \) are interpreted equivalently.
The drawback of the proposed envelope detector is the dependence of its gain on the input amplitude.

The output of the ED is fed to a chain of baseband amplifiers to be further boosted. To avoid inter-stage capacitors and additional biasing components the envelope detector is dc-coupled to the baseband amplifiers. This prevents high pass filtering through the decoupling capacitors and bias resistors. Such biasing strategy makes the demodulator useful also for signals with significant low-frequency energy and at low data rates, and minimizes the circuit losses. The disadvantage of this direct coupling is higher sensitivity to device variations.

The limiting amplifiers (LAs) are power efficient and easier to design due to their lower operating frequency compared to the LNA.

The amplification chain is composed of a cascade of five differential common source amplifiers with cross coupled neutralization and resistive loads. Since the envelope detector has a single ended output, a dummy envelop detector is used to bias the second input of the first baseband amplifier.

The employed architecture allows for wide input common-mode range given by (4-11): 

\[ V_{sat,M1} + V_{gs1,2} \leq V_{CM, in} \leq V_{dd} - R \frac{I_{ss}}{2} + V_{th} \]  

(4-11)

Furthermore through this architecture we directly set the common mode for the next stages as:

\[ V_{out,CM} = V_{dd} - R \frac{I_{ss}}{2} \]  

(4-12)

Although the limiting amplifiers operate at lower frequencies, the high data rate (10Gb/s) specified by the system requirements still dictates a wide bandwidth; consequently careful design is required to ensure sufficient gain in such a large operating bandwidth.

Cross coupled capacitive neutralization is employed in the cascaded baseband differential amplifiers to realize the required gain, and spiral inductors are avoided to save area. By
consuming 10.7mW of power, the baseband amplifying chain achieves a gain of more than 40dB in a bandwidth larger than 8.0GHz. Figure 4.22 illustrates a single baseband amplifier stage.

![Schematic of a single baseband amplifying stage](image)

**Fig. 4.22**: Schematic of a single baseband amplifying stage

The signal at the output of the baseband amplifiers has a large swing. Since this large swing signal should usually feed the 50Ω load, high current buffers are required. High current capability results in large width and consequently large input capacitance for the output buffers. This large capacitance loads the baseband amplifiers and creates a bandwidth bottleneck. To solve this problem, two $f_t$ doublers are employed in this design as the buffer stages to decrease the loading capacitance for the baseband amplifier. Such an architecture provides the same transconductance but half the $C_{gs}$ for the input device. The architectures of the buffer amplifier stages are shown in Fig. 4.23.

Unbalanced input from the preceding stages, mismatches on the threshold voltages or bias of the transistors or mismatches caused by the layout of the circuit leads to equivalent offset voltages at the input of the amplifying chain. The offset voltage causes pulse distortion, reduces the sensitivity of the receiver and in the case of a large gain for the receive chain, it may saturate the receiver. Accordingly, offset cancellation is required. For this purpose, an amplifier stage in a feedback loop is employed to guarantee the proper operation of the receiver and prevent the output bit stream from becoming saturated. It is a differential common source stage, the output of which is fed back to the input of the second amplifying stage. It should be noted that the feedback amplifier should have a narrow bandwidth to filter out just the DC offset but not the output bit stream; hence, a low path RC filter is employed at the input of the feedback amplifier to filter out the
higher frequency components at the output of the baseband amplifier. Schematic of the feedback amplifier is demonstrated in figure 4.24.

![Schematic of the feedback amplifier](image)

**Fig. 4.23:** Schematic of the first (a) and second (b) buffer amplifier stages

The simplified schematic shown in Fig. 4.25 can be used to investigate the effect of offset, before and after the insertion of the feedback amplifier.

![Simplified schematic](image)

**Fig. 4.24:** Schematic of the feedback amplifier
Fig. 4.25: Block diagram of the baseband section to investigate the role of the feedback amplifier on offset cancelation.

Following the notations in figure 4.25 the output offset in the absence of a feedback loop is given by:

\[ V_{OS,\text{out}} = G_m RAV_{os,\text{in}} \]  \hspace{1cm} (4-13)

However, when the loop is closed by the feedback amplifier, the output offset reduces to:

\[ V_{os,\text{out}} = \frac{G_m RAV_{os,\text{in}}}{1 + G_{mF} R_A V_{os,\text{in}}} + \frac{G_{mF} RAV_{os,F}}{1 + G_{mF} R_A V_{os,F}} \approx \frac{G_m}{G_{mF}} V_{os,\text{in}} + V_{os,F} \]  \hspace{1cm} (4-14)

\( R_F \) and \( C_F \) in figure 4.25 determine a pole at the cut-off frequency given by Eq. (4-15). As stated earlier they should be chosen large enough to extract just the DC components of the output signal and to prevent the wanted signal from being fed back to the input.

\[ f_c = \frac{A G_{mF} / 2 + 1}{2\pi R_F C_F} \]  \hspace{1cm} (4-15)
It should be noted that at the cost of higher complexity in the layout, it is feasible to employ DC offset cancellation for every single stage along the baseband amplifier chain to achieve larger DC offset suppression.

The baseband circuitry is realized in 28nm CMOS technology node and occupies an area of 0.01mm$^2$ including the output buffers. Layout of baseband section is shown in figure 4.26.

All the proposed baseband structures are subject to supply coupling. Therefore, careful attention was paid in the layout to ensure proper bypass around the local power lines. Furthermore, differential configuration is adopted throughout the whole baseband chain to provide common mode noise rejection. A differential structure makes the overall design less sensitive to the bonding inductances of the supply pads. It will also provide better PSRR. These advantages come at the cost of larger power consumption.

![Fig. 4.26: Layout of the baseband section](image)

Performances of the blocks in the baseband chain are summarized in table 4.6.

**Table 4.6: Simulated performance of the blocks in the baseband section**

<table>
<thead>
<tr>
<th>ED performance summary</th>
<th>LA Performance Summary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power Consumption</td>
<td>1.6mW</td>
</tr>
<tr>
<td>Gain</td>
<td>-6.0dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>15.5GHz</td>
</tr>
<tr>
<td>Power Consumption</td>
<td>25mW</td>
</tr>
<tr>
<td>Gain</td>
<td>40dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>8.0GHz</td>
</tr>
</tbody>
</table>
Chapter Summary

This chapter was dedicated to the building blocks of our 50GHz short range data link. The transmitter section of the link is composed of an mm-wave VCO that generates the 50GHz carrier and a wideband power amplifier which simultaneously modulates the carrier with the input data and amplifies it for being transmitted over the communication channel. The power amplifier employs capacitive cross-coupled neutralization to realize a bandwidth of more than 20GHz with reasonable output power.

Design and implementation of receiver building blocks were also discussed in this chapter. A non-coherent on-off keying demodulation scheme is adopted for the receiver. As a consequence, there is no requirement for a down conversion mixer and/or a local oscillator which makes the design much simpler, more compact and more power efficient.

A low noise amplifier, an envelope detector and a chain of baseband amplifiers comprise the receiver. The LNA adopts current re-use architecture and employs third order inter-stage matching network to achieve the required design specifications. A balun is inserted at the output of the LNA to convert its single ended output to the required differential input for the envelope detector.

The envelope detector employs a cascode configuration with PMOS active load to obtain a tunable gain. The output of the ED is directly connected to the limiting amplifier chain. Five differential capacitive cross-coupled stages comprise the baseband amplifiers which provide rail to rail amplification for the output signal of the ED.

The output of the chain of baseband amplifiers is fed to the bit interpreters through two \( f_T \) doubler buffer stages. Furthermore, a feedback loop guarantees the proper operation of the system against DC offset and prevents the output bit stream from becoming saturated.
Chapter 5

Measurement Results

Introduction

During the previous chapters design of the building blocks of a mm-wave high data rate transceiver was discussed.

We started the first chapter by investigating possible applications and probable challenges encountered at mm-wave ranges of frequencies.

Next, in chapter two after a short literature survey on mm-wave short range links, the proposed transceiver together with its system level calculations were presented.

Consequently, design of a low noise amplifier for achieving high gain-bandwidth was discussed. The same core with some minor modifications is used in the architecture of our mm-wave receiver to provide the low noise amplification.

Chapter four was dedicated to design of building blocks of the proposed transceiver. Circuit level analyses, layout and simulated performance of all the blocks were presented.

The proposed wireless intra-connect is realized in bulk CMOS 28nm technology node. In this chapter measurement setup for investigating the performance of the link and its constituent building blocks will be explored, the measured result will be presented and compared with state of the art.

DC Measurements

The first thing to do in measuring the performance of an integrated circuit is determining the process corner at which it is operating. Process corners represent the extremes of parameter variations within which a circuit etched onto the wafer must function correctly.
For this purpose, the I-V characteristic of a 40μm diode connected MOS transistor was simulated in different corners of operation and the results were compared with the measured values. As illustrated in Fig. 5.1 the simulated curve in SS corner perfectly matches the measured results which reveal that the chip is working in SS corner. Knowing that, all the blocks were re-simulated in SS corner and at room temperature to anticipate their measured performance. During the following sections, measurement set-up and results will be presented.

![Simulated vs. measured I-V characteristic of a diode connected MOS in SS corner](image)

**Fig. 5.1:** Simulated vs. measured I-V characteristic of a diode connected MOS in SS corner

### Low Noise Amplifier Measurement

The chip micrograph of the fabricated low noise amplifier is illustrated in figure 5.2. The chip size is 0.84mm² including the pads.

![Chip micrograph of the low noise amplifier](image)

**Fig. 5.2:** Chip micrograph of the low noise amplifier
The setup used for investigating the S-parameters of the LNA is very simple and is shown in figure 5.3. Since the pad capacitances were absorbed into the input/output matching networks of the LNA, no de-embedding is required for the pads during the measurements.

Two different chips have been measured to properly characterizing the LNA. Figure 5.4 demonstrates the measured S-parameters of these two chips. The measurement results have a very close agreement with each other which confirm the robustness of the design and the validity of the measurement. The LNA consumes 25.3mW of power. The power consumption of the buffer –that is used for measurement purposes only–is 19mW.
Different techniques exist to characterize the noise performance of the LNA [63]. The most straightforward method is using a noise figure meter as shown in figure 5.5. Since, the input noise and signal to noise ratio of the noise source are known to the noise analyzer, the noise figure of the DUT can be calculated internally.

The procedure based on which the noise figure is internally calculated in the device is known as Y Factor method. To use the Y factor method, an Excess Noise Ratio (ENR) source is needed. By turning the noise source on and off (through turning on and off the DC bias voltage), one can measure the change in the output noise power density with a spectrum analyzer. The NF is consequently calculated according to (5-1):

\[
Noise\ Figure\ (NF) = 10 \log \left( \frac{10^{(ENR/10)}}{10^{(Y/10)}} \right) \quad (5-1)
\]

In (5-1) ENR is a number that can be found from tables listed on ENR head. Y is the difference between the output noise power density when the noise source is on and off.

An ENR noise head provides a noise source at two noise temperatures: a hot \( T = T_H \) (when a DC voltage is applied) and a cold \( T = 290^\circ K \). The excess noise is achieved by biasing a noisy diode according to (5-2):

\[
Y = \frac{\left( T_H/290 + T_n/290 \right)}{\left( 1 + T_n/290 \right)} \quad (5-2)
\]
This is the Y factor, from which this method gets its name. Turning on and off the DC supply voltage and monitoring the output noise density on a spectrum analyzer, Y can be determined by measuring the difference in the noise densities in the two cases.

To calculate the NF of the LNA we have used the noise figure measurement utility of an Agilent N9030A PXA, in a down-conversion architecture using a Noisecom NC5215 50-75GHz noise source, a Millitech AMP-15-02100 Amplifier, a Millitech MXP-15-RF0FN balanced mixer, an Agilent E8257D PSG signal generator as LO and a Hittite HMC-C004 amplifier. Since the noise source that we have used was only able to generate noise beyond 50GHz, the noise figure of the LNA can only be calculated from 50GHz up to the end of the operating band. On the other hand, the frequency range of Agilent N9030A signal analyzer available for the measurements spans from 3Hz to 50GHz. Hence, a balanced mixer is used to down convert the spectrum to frequencies within the operating range of the signal analyzer.

The measurement setup used in our experiment is shown in figure 5.6. An RF amplifier is used before the mixer and an IF amplifier after the mixer to further amplify the DUT’s output so that it is beyond the noise floor of the signal analyzer. This guarantees a reliable and accurate noise measurement. After de-embedding the losses of connection cables and the noise of mixer and the amplifiers, signal analyzer shows the NF of the LNA.

Measured noise figure of the LNA on two different chips is demonstrated in figure 5.7. The good agreement between both the results confirms the validity of our measurements.

---

1. A Noisecom NC5222 33-50GHz noise source has been ordered but is not delivered yet.
Simulated performance of the LNA is compared with the measurement results in Fig. 5.8. As can be seen, simulated NF and input reflection coefficient (S11) of the LNA, match pretty well. Regarding the forward gain of the LNA, there is a slight difference between simulated and measured values at the lower end of the frequency band. This is due to the fact that the model used for the transmission lines is generated based on a fit to measured parameters of a prototype previously fabricated in TT corner. The characteristic of transmission lines in SS corner differs from those in TT corner. However, since there is no fabricated prototype available in SS corner, we are not able to mirror the change in the line characteristics in SS corner. Furthermore, the structure relies on a huge number of resonances within each stage and/or stagger tuning between the stages to achieve the large operating bandwidth. Due to PVT variation some shift is probable in the frequency response of the stages which causes this difference.
Performance of the fabricated LNA is compared with state of the art in table 5.1. As can be seen, although the LNA is working in SS corner, still the largest GBW product has been obtained while achieving one of the lowest NF. The NF is relatively flat across the whole band which together with the flat gain of the LNA lead to constant SNR for the whole receiver. Furthermore, power consumption is in line with other works.

Table 5.1: Performance comparison of the fabricated LNA versus state of the art

<table>
<thead>
<tr>
<th>Ref</th>
<th>Tech. (CMOS)</th>
<th>S21 (dB)</th>
<th>BW (GHz)</th>
<th>NF (dB)</th>
<th>OP1-dB (dBm)</th>
<th>Pdc (mW)</th>
<th>Area</th>
<th>GBW (dB.GHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[64]</td>
<td>130nm</td>
<td>14.7</td>
<td>7.0</td>
<td>5.7-6.8</td>
<td>-</td>
<td>64.8</td>
<td>350um x 300um</td>
<td>102.9</td>
</tr>
<tr>
<td>[65]</td>
<td>90nm</td>
<td>15.0</td>
<td>6.0</td>
<td>4.4-5.0</td>
<td>-3.0</td>
<td>4.0</td>
<td>440um x 320um</td>
<td>90.0</td>
</tr>
<tr>
<td>[66]</td>
<td>90nm LP</td>
<td>17.0</td>
<td>17.0</td>
<td>4.4-8.0</td>
<td>-1</td>
<td>19.2</td>
<td>1000um x 590um</td>
<td>289.0</td>
</tr>
<tr>
<td>[47]</td>
<td>65nm</td>
<td>24.0</td>
<td>17.0</td>
<td>4.0-7.6</td>
<td>+2.1</td>
<td>30.0</td>
<td>-</td>
<td>408.0</td>
</tr>
<tr>
<td>[44]</td>
<td>65nm</td>
<td>28.0</td>
<td>13.0</td>
<td>5.2-7.3</td>
<td>-</td>
<td>18.0</td>
<td>-</td>
<td>364.0</td>
</tr>
<tr>
<td>[67]</td>
<td>65nm</td>
<td>17.0</td>
<td>14.0</td>
<td>6.5-8.1</td>
<td>+6.0</td>
<td>5.0</td>
<td>170um x 320um</td>
<td>238.0</td>
</tr>
<tr>
<td>[43]</td>
<td>65nm</td>
<td>26.0</td>
<td>9.0</td>
<td>4.0-5.5</td>
<td>-3.5</td>
<td>8.0</td>
<td>350um x 140um</td>
<td>234.0</td>
</tr>
<tr>
<td>[68]</td>
<td>65nm</td>
<td>17.5</td>
<td>7.0</td>
<td>5.3-6.5</td>
<td>-</td>
<td>18.0</td>
<td>703um x 727um</td>
<td>122.5</td>
</tr>
<tr>
<td>[4]</td>
<td>40nm</td>
<td>18.0</td>
<td>11.0</td>
<td>7.0-8.0</td>
<td>-</td>
<td>14.3</td>
<td>200um x 240um</td>
<td>198.0</td>
</tr>
<tr>
<td>This Work</td>
<td>28nm</td>
<td>22.3</td>
<td>30.0</td>
<td>4.1-6.2</td>
<td>-</td>
<td>25.3</td>
<td>1150um x 730um</td>
<td>669.0</td>
</tr>
</tbody>
</table>

*: Estimated from the Figure  **: Excluding the pads

The improvement obtained in this work on increasing the GBW of the amplifier can be better understood looking at figure 5.9. The fact that the performance degrades in SS corner further demonstrates the potential of the proposed method in increasing the GBW of amplifiers.

Fig. 5.9: Comparison of the GBW of the fabricated amplifier vs. state of the art
Transceiver Measurements

Our mm-wave simple modulation transceiver front-end proposes a complete On-Off Keying (OOK) transceiver for wireless communication featuring multi-Gbps speed over a distance of several centimeters with a 50GHz carrier.

Figures 5.10 and 5.11 show the chip micrographs of the transmitter and receiver, respectively. The TX is formed by a 50GHz continuous-wave oscillator and a power amplifier directly switched on and off by the input bitstream. The RX performs direct power detection, employing an LNA, an envelope detector and a 5-stage limiting amplifier.

The prototypes have been assembled on custom-designed boards featuring the monopole isotropic patch antenna connected to the chips through wire bondings as illustrated in figure 5.12.

Figure 5.13 shows the structure of the antenna that is realized on Rogers RT/Duroid 5880. Such structure has been chosen for its simplicity, ease of fabrication, and intrinsic broadband operation.
The setup used for measurements is as shown in Fig. 5. 14. First, the input pattern is generated by Anritsu MP 1763B pulse generator which works from 50MHz up to 12.5GHz. The input data is fed to the modulator. After amplifying the modulated data, the signal is transmitted to the receiver via the TX antenna. The transmitted signal is received by the RX antenna, amplified by the LNA and then is de-modulated by the envelope detector. It is then further amplified by the baseband amplifiers and is sent to BER tester to investigate the link performance.

Power consumptions of different building blocks in the transceiver system are summarized in table 5.2.

**Table 5.2: Power consumption of different blocks of transceiver**

<table>
<thead>
<tr>
<th>Transmitter Section</th>
<th>Power consumption</th>
<th>Receiver Section</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>VCO</td>
<td>9.2mW</td>
<td>LNA</td>
<td>27.5mW</td>
</tr>
<tr>
<td>PA</td>
<td>45.9mW</td>
<td>Envelope detector</td>
<td>1.1mW</td>
</tr>
<tr>
<td>-</td>
<td>-</td>
<td>Limiting Amplifier</td>
<td>24mW</td>
</tr>
<tr>
<td>-</td>
<td>-</td>
<td>Output Buffers</td>
<td>27.5mW</td>
</tr>
<tr>
<td>Total</td>
<td>55.1mW</td>
<td>Total</td>
<td>80.1mW</td>
</tr>
</tbody>
</table>
Figure 5.14: measurement set-up for the mm-wave OOK transceiver (a) Close-up of the transceiver front-end (b)

Figure 5.15 shows the some of the eyes at the output of the receiver for different data rates and communication distances. As can be seen the eye is perfectly open and completely symmetric.

Performance of the transceiver system was investigated under different data rates, bit pattern and distance. Figure 5.16 illustrates the BER performance of the designed short

---

1 TX and RX are covered by plastic caps to protect the bare dice. Apertures are carved into the caps in order to avoid shadowing of the signal.
range link versus the distance. An error free operation (BER<10^{-12}) is achieved for 5Gbps data up to 14cm for a PRBS of $2^7$-1.

Fig. 5.15: Eye diagrams at the output of the receiver for a 2Gbps pattern at 15cm (a), 3Gbps pattern at 16cm (b), 4Gbps pattern at 5cm (c)

Fig. 5.16: BER performance of the TRX vs. distance
Performance of the designed mm-wave short range link is compared with state of the art in Table 5.3.

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Modulation Scheme</th>
<th>Tech. (CMOS)</th>
<th>$F_{\text{carrier}}$ (GHz)</th>
<th>Data rate/distance</th>
<th>Antenna Gain (TX+RX)</th>
<th>BER</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>[4]</td>
<td>ASK</td>
<td>40nm</td>
<td>56GHz</td>
<td>11.0Gbps/1.4cm</td>
<td>8dBi</td>
<td>$10^{-12}$</td>
<td>70mW</td>
</tr>
<tr>
<td>[69]</td>
<td>BPSK</td>
<td>65nm</td>
<td>84GHz</td>
<td>2.5Gbps/100cm</td>
<td>48dBi</td>
<td>$10^{-12}$</td>
<td>327mW</td>
</tr>
<tr>
<td>[25]</td>
<td>BPSK</td>
<td>65nm</td>
<td>60GHz</td>
<td>1.8Gbps/274cm</td>
<td>4dBi</td>
<td>$10^{-3}$</td>
<td>292mW</td>
</tr>
<tr>
<td>[70]</td>
<td>QPSK</td>
<td>65nm</td>
<td>60GHz</td>
<td>2.6Gbps/4.0cm</td>
<td>0dBi</td>
<td>$10^{-12}$</td>
<td>1348mW</td>
</tr>
<tr>
<td>[71]</td>
<td>SC</td>
<td>90nm</td>
<td>60GHz</td>
<td>1.5Gbps/100cm</td>
<td>13dBi</td>
<td>$10^{-12}$</td>
<td>1772mW</td>
</tr>
<tr>
<td>[72]</td>
<td>OOK</td>
<td>90nm</td>
<td>60GHz</td>
<td>2.2Gbps/7.5cm</td>
<td>10dBi</td>
<td>$10^{-5}$</td>
<td>98mW</td>
</tr>
<tr>
<td><em>This work</em></td>
<td>OOK</td>
<td>28nm</td>
<td>50GHz</td>
<td>5.0Gbps/14.0cm</td>
<td>0dBi</td>
<td>$10^{-12}$</td>
<td>135.2mW</td>
</tr>
</tbody>
</table>

* The data is not yet definitive. Measurements are still ongoing for higher data rates.

**Conclusion**

In this chapter measurement results and the setup used for this purpose were explained. The measured values were compared with those obtained from the simulations. The good match between both the results validates the accuracy of simulations and the repeatability of the measurements confirms design robustness.
CONCLUSION

System level analysis of a mm-wave wideband high data rate transceiver together with circuit level implementation of its building blocks are presented in this dissertation.

Thanks to the large available bandwidth at mm-wave, less spectrally efficient modulation schemes like OOK can be employed to ease the implementation of the transceiver and to minimize the required power and die area.

After a brief introduction on motivations, applications and challenges of mm-wave design in chapter one, system level analyses of the short range link were carried out in chapter two. The presence of the nonlinear envelope detector in the receiving chain complicates link budget calculations and increases the noise figure of the receiver by 6dB. Taking this into account the design requirements were calculated to achieve an error free operation (BER<10^{-12}) over a communication distance of 1 to 10cm.

To realize 10Gbps communication at least 20GHz of bandwidth is required for the RF front ends. Such a wide bandwidth is realized by employing capacitive cross coupled neutralization for the PA and by employing third order inter-stage networks for the LNA. Tailored design of the inter-stage matching networks, wideband stagger tuning, common source amplifying stages and the current sharing structure of the LNA have led to the highest GBW reported so far for a mm-wave amplifier while consuming a relatively low power.

Implementation of the baseband circuitries of the 50GHz data link was also discussed. Design of a 50GHz VCO for carrier generation and a wideband 50GHz PA for signal amplification were also mentioned as part of transmitter circuitry.

A theme that was pursued in this work was to replace spiral inductors in single ended blocks with transmission lines to provide better EM confinement and ground current return path or to employ differential structure to achieve better CMRR and PSRR.

Prototypes of all the blocks have been realized and successfully tested in 28nm bulk CMOS technology node. Measurement results demonstrate the validity of the proposed idea to realize a wireless link capable of Gbps data transfer over distance of several centimeters.

In a future design for an amplifier, the capacitive cross coupled neutralization technique employed in the PA, can be combined with the techniques employed in the design of the
LNA (such as higher order inter-stage networks and stagger tuning) to increase the GBW of the amplifier even beyond those values achieved in this work. Furthermore, effort can be put in optimizing the output power of the PA to increase the communication distance to more than 20cm.
References:


