Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: VLSI
Description: this is a full book for vlsi will be very usefull author:RABEAY

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


DIGITAL INTEGRATED CIRCUITS
A DESIGN PERSPECTIVE
2ND EDITION
Jan M
...
1 A Historical Perspective
1
...
3 Quality Metrics of a Digital Design
1
...
1
Cost of an Integrated Circuit
1
...
2
Functionality and Robustness
1
...
3
Performance
1
...
4
Power and Energy Consumption
1
...
fm Page 43 Monday, September 6, 1999 1:50 PM

CHAPTER

3

THE DEVICES
Qualitative understanding of MOS devices
n
Simple component models for manual analysis
n
Detailed component models for SPICE
n
Impact of process variations

3
...
3
...
2

The Diode

3
...
4

The Actual MOS Transistor— Some
Secondary Effects

3
...
5

SPICE Models for the MOS Transistor

3
...
1
3
...
2

Static Behavior

3
...
2
...
5

Perspective: Technology Scaling

3
...
4

The Actual Diode— Secondary Effects

3
...
5
3
...
3
...
3
...
fm Page 44 Monday, September 6, 1999 1:50 PM

44

THE DEVICES

Chapter 3

3
...

This surely holds for digital circuit design as well
...
The role of the
semiconductor devices has been appreciated for a long time in the world of digital integrated circuits
...

Giving the reader the necessary knowledge and understanding of these components
is the prime motivation for the next two chapters
...
We refer
the reader to the many excellent textbooks on semiconductor devices for that purpose,
some of which are referenced in the To Probe Further section at the end of the chapters
...

Another important function of this chapter is the introduction of
models
...
Such an
approach is similar to considering the molecular structure of concrete when constructing a
bridge
...
A range of models can be conceived for each component presenting a
trade-off between accuracy and complexity
...
It has limited accuracy but helps us to understand the operation of the circuit
and its dominant parameters
...
In this
chapter, we present both first-order models for manual analysis as well as higher-order
models for simulation for each component of interest
...

They should be aware, however, that these are only nominal values, and that the actual
parameter values vary with operating temperature, over manufacturing runs, or even over
a single wafer
...


3
...
Each MOS transistor implicitly contains a number of
reverse-biased diodes that directly influence the behavior of the device
...
Diodes are also used to protect the
input devices of an IC against static charges
...
Rather than being comprehensive,

chapter3
...
2

The Diode

45

we choose to focus on those aspects that prove to be influential in the design of digital
1
MOS circuits, this is the operation in reverse-biased mode
...
2
...
Figure 3
...
It consists of two homogeneous regions of and nptype material, separated by a region of transition from one type of doping to another,
which is assumed thin
...
The p-type material is doped with acceptor impurities (such as boron), which results in the presence of
holes as the dominant or majority carriers
...
Aluminum contacts provide access to thep- and n-terminals of the
device
...
1c
...
1b)
...
The electron concentration
changes from a high value in the n-type material to a very small value in the p-type
material
...
This gradient causes electrons to
B

A

Al

SiO2

p

n

(a) Cross-section of pn-junction in an IC process
A

Al

A

p

n

B
B

(b) One-dimensional
representation

1

Figure 3
...

(c) Diode symbol

We refer the interested reader to the web-site of the textbook for a comprehensive description of the
diode operation
...
fm Page 46 Monday, September 6, 1999 1:50 PM

46

THE DEVICES

Chapter 3

diffuse from n to p and holes to diffuse fromp to n
...
Consequently, the p-type material is negatively charged in the vicinity of thepn-boundary
...
The region at the junction, where the
majority carriers have been removed, leaving the fixed acceptor and donor ions, is called
the depletion or space-charge region
...
This field counteracts the diffusion of holes
and electrons, as it causes electrons to drift from p to n and holes to drift from n to p
...

The above analysis is summarized in Figure 3
...
In the device shown, thep material is more heavily doped than
Hole diffusion
Electron diffusion
(a) Current flow
p

n

Hole drift
Electron drift
ρ

Charge
density

+

x

(b) Charge density

Distance
-

Electrical
field

E
x

V

Potential

x
-W1

Figure 3
...


φ0

(d) Electrostatic
potential

chapter3
...
2

The Diode

47

the n, or NA > ND, with NA and ND the acceptor and donor concentrations, respectively
...
Figure 3
...
This potential has the value
NA ND
φ 0 = φ T ln -------------n i2

(3
...
2)

where φT is the thermal voltage

The quantity ni is the intrinsic carrier concentration in a pure sample of the semiconductor
and equals approximately 1
...

Example 3
...

Calculate the built-in potential at 300 K
...
(3
...
25 × 10 20

3
...
2

Static Behavior

The Ideal Diode Equation
Assume now that a forward voltage VD is applied to the junction or, in other words, that
the potential of the p-region is raised with respect to then-zone
...
Consequently, the flow of mobile carriers across the junction
increases as the diffusion current dominates the drift component
...
3 Minority carrier concentrations in the neutral region near an abruptpn-junction under forward-bias
conditions
...
fm Page 48 Monday, September 6, 1999 1:50 PM

48

THE DEVICES

Chapter 3

the depletion region and are injected into the neutraln- and p-regions, where they become
minority carriers, as is illustrated in Figure 3
...
Under the assumption that no voltage gradient exists over the neutral regions, which is approximately the case for most modern
devices, these minority carriers will diffuse through the region as a result of the concentration gradient until they get recombined with a majority carrier
...

On the other hand, when a reverse voltage VD is applied to the junction or when the
potential of the p-region is lowered with respect to the n-region, the potential barrier is
raised
...
A current flows from then-region to the p-region
...
4)
...
The diode
thus acts as a one-way conductor
...
4 Minority carrier concentration in the neutral regions near the pn-junction under reverse-bias
conditions
...
This is illustrated in Figure 3
...
The exponential behavior for positive-bias voltages is
even more apparent in Figure 3
...
The
current increases by a factor of 10 for every extra 60 mV (= 2
...
At
small voltage levels (VD < 0
...

The behavior of the diode for both forward- and reverse bias conditions is best
described by the well-known ideal diode equation, which relates the current through the
diode ID to the diode bias voltage VD
I D = I S ( e VD ⁄ φ T – 1 )

(3
...
(3
...
5
...
(3
...

IS represents a constant value, called thesaturation current of the diode
...
fm Page 49 Monday, September 6, 1999 1:50 PM

Section 3
...
5

10–5

Deviation due to
recombination

I D (A)

I D (mA)

2
...
5

–0
...
0

–0
...
0
VD (V)

(a) On a linear scale
Figure 3
...
3 φT V / decade
current

10–10

0
...
0

10–15
0
...
2

0
...
6

0
...


regions
...
It is worth mentioning that in actual
devices, the reverse currents are substantially larger than the saturation currentIS
...
The electric field present sweeps these carriers out of the region, causing an additional current
component
...
Actual device measurements are, therefore, necessary to determine realistic
values for the reverse diode leakage currents
...
A first model, shown in Figure 3
...
(3
...
While this model yields accurate results, it has
the disadvantage of being strongly nonlinear
...
An often-used, simplified model is derived by
inspecting the diode current plot of Figure 3
...
For a “fully conducting” diode, the voltage
drop over the diode VD lies in a narrow range, approximately between 0
...
8 V
...
6



Diode models
...
fm Page 50 Monday, September 6, 1999 1:50 PM

50

THE DEVICES

Chapter 3

first degree, it is reasonable to assume that a conducting diode has a fixed voltage drop
VDon over it
...
7 V is typically
assumed
...
6b, where a conducting diode is replaced
by a fixed voltage source
...
2 Analysis of Diode Network
Consider the simple network of Figure 3
...
5
× 10–16 A
...
224 mA, and VD =
0
...
The simplified model with VDon = 0
...
7 V, ID =
0
...
It hence makes considerable sense to use this model when determining a first-order solution of a diode network
...
7

3
...
3

A simple diode circuit
...
Just as important in the design of digital circuits is the response of the device to
changes in its bias conditions
...
Because the operation mode of the diode
is a function of the amount of charge present in both the neutral and the space-charge
regions, its dynamic behavior is strongly determined by how fast charge can be moved
around
...
In fact, all diodes in an
operational MOS digital integrated circuit are reverse-biased and are supposed to remain
so under all circumstances
...

A signal over(under) shooting the supply rail is an example of such
...

Hence, we will devote our attention solely to what governs the dynamic response of
the diode under reverse-biasing conditions, the depletion-region charge
...
fm Page 51 Monday, September 6, 1999 1:50 PM

Section 3
...
The corresponding charge distribution
under zero-bias conditions was plotted in Figure 3
...
This picture can be easily extended
to incorporate the effects of biasing
...
This corresponds to a reduced depletion-region width
...
These observations are confirmed by the well- known depletion-region
expressions given below (a derivation of these expressions, which are valid for abrupt
junctions, is either simple or can be found in any textbook on devices such as [Howe97])
...

1
...

NA N D
Q j = A D  2ε si q -------------------- ( φ 0 – V D )


NA + N D

(3
...
Depletion-region width
...
5)

3
...

Ej =

N N
 2q -------------------- ( φ – V )
------ A D
D
 ε si N A + N D 0

(3
...
7
times the permittivity of a vacuum, or 1
...
The ratio of the n- versus p-side
of the depletion-region width is determined by the doping-level ratios: 2/(−W1) = NA/ND
...
Because the space-charge region
contains few mobile carriers, it acts as an insulator with a dielectric constantεsi of the
semiconductor material
...
A small change
in the voltage applied to the junctiondVD causes a change in the space charge dQj
...
7)

where Cj0 is the capacitance under zero-bias conditions and is only a function of the physical parameters of the device
...
fm Page 52 Monday, September 6, 1999 1:50 PM

52

THE DEVICES

Chapter 3

2
...
5

Cj (fF)

1
...
0
Linear junction
m = 0
...
5

0
...
0

Cj0

-2
...
0

VD (V)
Figure 3
...

µ

ε si q N A N D

C j0 = A D  --------- -------------------- φ 0 1
 2 N A + N D

(3
...
(3
...
Typically, the AD factor is
omitted, and Cj and Cj0 are expressed as a capacitance/unit area
...
8 for a typical silicon diode found in MOS circuits
...
Note also that the capacitance decreases with an increasing reverse
bias: a reverse bias of 5 V reduces the capacitance by more than a factor of two
...
3 Junction Capacitance
Consider the following silicon junction diode:Cj0 = 2 × 10-3 F/m2, AD = 0
...
64 V
...
5 V results in a junction capacitance of 0
...
9
fF/µm2), or, for the total diode, a capacitance of 0
...


Equation (3
...
This is often not the
case in actual integrated-circuit pn-junctions, where the transition fromn to p material can
be gradual
...
An analysis of the
linearly-graded junction shows that the junction capacitance equation of Eq
...
7) still
holds, but with a variation in order of the denominator
...
fm Page 53 Monday, September 6, 1999 1:50 PM

Section 3
...
9)

where m is called the grading coefficient and equals 1/2 for the abrupt junction and 1/3 for
the linear or graded junction
...
8
...
8 raises awareness to the fact that the junction capacitance is a voltage-dependent
parameter whose value varies widely between bias points
...
Under those circumstances, it is more
attractive to replace the voltage-dependent, nonlinear capacitanceCj by an equivalent, linear capacitance Ceq
...
10)

Combining Eq
...
4) (extended to accommodate the grading coefficient and Eq
...
10) yields the value of Keq
...
11)

Example 3
...
3 is switched between 0 and−2
...
Compute the average junction
capacitance (m = 0
...

For the defined voltage range and forφ0 = 0
...
622
...
24 fF/µm2
...
2
...
Not
all applied bias voltage appears directly across the junction, as there is always some voltage drop over the neutral regions
...
This effect can be modeled by adding a resistor in series with the n- and p-region diode contacts
...
When the reverse bias
exceeds a certain level, called the breakdown voltage, the reverse current shows a dramatic increase as shown in Figure 3
...

this increase is caused by the avalanche breakdown
...
Consequently, carriers crossing
the depletion region are accelerated to high velocity
...
fm Page 54 Monday, September 6, 1999 1:50 PM

54

THE DEVICES

Chapter 3

ID (A)

0
...
1
–25
...
0

–5
...
0

Figure 3
...


VD (V)

reach a high -enough energy level that electron-hole pairs are created on collision with
immobile silicon atoms
...
The value ofEcri t is approximately 2 ×10 5 V/cm for impurity concentrations of the order of 1016 cm−3
...
Observe thatavalanche
breakdown is not the only breakdown mechanism encountered in diodes
...
Discussion of this phenomenon is beyond the scope of this text

...
The thermal voltage φT, which appears in the exponent of the current equation, is
linearly dependent upon the temperature
...

2
...
Theoretically, the saturation current approximately doubles every 5 °C
...

This dual dependence has a significant impact on the operation of a digital circuit
...
For instance,
for a forward bias of 0
...
Secondly, integrated circuits rely heavily on reverse-biased diodes as
isolators
...

3
...
5

The SPICE Diode Model

In the preceding sections, we have presented a model for manual analysis of a diode circuit
...
fm Page 55 Monday, September 6, 1999 1:50 PM

Section 3
...
10

SPICE diode model
...
While different circuit simulators have
been developed over the last decades, the SPICE program, developed at the University of
California at Berkeley, is definitely the most successful [Nagel75]
...
The accuracy of the simulation
depends directly upon the quality of this model
...
Creating accurate and computation-efficient SPICE models has been a long process and is by no means finished
...

The standard SPICE model for a diode is simple, as shown in Figure 3
...
The
steady-state characteristic of the diode is modeled by the nonlinear current source D,
I
which is a modified version of the ideal diode equation
I D = I S ( e V D ⁄ nφT – 1 )

(3
...
It equals 1 for most common diodes but can be somewhat higher than 1 for others
...
For
higher current levels, this resistance causes the internal diodeVD to differ from the externally applied voltage, hence causing the current to be lower than what would be expected
from the ideal diode equation
...
Only the former was discussed
in this chapter, as the latter is only an issue under forward-biasing conditions
...
13)

A listing of the parameters used in the diode model is given in Table 3
...
Besides the
parameter name, symbol, and SPICE name, the table contains also the default value used
by SPICE in case the parameter is left undefined
...
Other parameters are available to govern second-order effects such as break-

chapter3
...
To be concise, we chose to limit the listing to the
parameters of direct interest to this text
...
g
...

Table 3
...


Parameter Name

Symbol

SPICE Name

Units

Default Value

Saturation current

IS

IS

A

1
...
5

Junction potential

φ0

VJ

V

1

3
...
Its major asset from a digital perspective is that the device performs very well as a switch, and introduces little parasitic
effects
...

Following the approach we took for the diode, we restrict ourselves in this section to
a general overview of the transistor and its parameters
...
The discussion concludes with an enumeration of
some second-order effects and the introduction of the SPICE MOS transistor models
...
3
...
The voltage applied to thegate terminal determines if and how much current flows between thesource and the drain ports
...
Its function is secondary as it only serves to
modulate the device characteristics and parameters
...
When a
voltage is applied to the gate that is larger than a given value called the
threshold voltage
VT, a conducting channel is formed between drain and source
...
The conductivity of the
channel is modulated by the gate voltage— the larger the voltage difference between gate
and source, the smaller the resistance of the conducting channel and the larger the current
...
fm Page 57 Monday, September 6, 1999 1:50 PM

Section 3
...

Two types of MOSFET devices can be identified
...
The current is carried by
electrons moving through an n-type channel between source and drain
...
MOS
devices can also be made by using ann-type substrate and p+ drain and source regions
...
The device
is called a p-channel MOS, or PMOS transistor
...
The cross-section of a contemporary dual-well CMOS
process was presented in Chapter 2, and is repeated here for convenience Figure 3
...

(
gate-oxide
TiSi 2

AlCu
SiO 2

Tungsten
poly
p-well

SiO 2

n-well

n+

p- epi

p+
p+

Figure 3
...


Circuit symbols for the various MOS transistors are shown in Figure 3
...
As mentioned earlier, the transistor is a four-port device with gate, source, drain, and body terminals (Figures a and c)
...
If the fourth terminal is not shown, it is
assumed that the body is connected to the appropriate supply
...
12 Circuit symbols for
MOS transistors
...
fm Page 58 Monday, September 6, 1999 1:50 PM

58

THE DEVICES

3
...
2

Chapter 3

The MOS Transistor under Static Conditions

In the derivation of the static model of the MOS transistor, we concentrate on the NMOS
device
...

The Threshold Voltage
Consider first the case where VGS = 0 and drain, source, and bulk are connected to ground
...
Under the mentioned conditions, both junctions have a 0 V bias and can
be considered off, which results in an extremely high resistance between drain and source
...
13
...
The positive gate voltage causes positive charge to
accumulate on the gate electrode and negative charge on the substrate side
...
Hence, a depletion region is formed
below the gate
...
Consequently, similar expressions hold for the width and the space charge per unit
area
...
(3
...
(3
...

Wd =

2ε si φ
-----------qN A

(3
...
15)

and
Qd =

with NA the substrate doping and φ the voltage across the depletion layer (i
...
, the potential
at the oxide-silicon boundary)
...
This
+

S

VGS


D

G

n+

n+

n-channel

Depletion
region
p-substrate
B

Figure 3
...


chapter3
...
3

The MOS(FET) Transistor

59

point marks the onset of a phenomenon known asstrong inversion and occurs at a voltage
equal to twice the Fermi Potential (Eq
...
16)) (φF ≈ −0
...
16)

Further increases in the gate voltage produce no further changes in the depletionlayer width, but result in additional electrons in the thin inversion layer directly under the
oxide
...

n+
Hence, a continuous n-type channel is formed between the source and drain regions, the
conductivity of which is modulated by the gate-source voltage
...
17)

This picture changes somewhat in case a substrate bias voltageVSB is applied (VSB is normally positive for n-channel devices)
...
The charge stored in the depletion
region now is expressed by Eq
...
18)
QB =

2qN A ε si ( – 2 φ F + V SB )

(3
...

VT is a function of several components, most of which are material constants such as the
difference in work-function between gate and substrate material, the oxide thickness, the
Fermi voltage, the charge of impurities trapped at the surface between channel and gate
oxide, and the dosage of ions implanted for threshold adjustment
...

as well
...
The threshold voltage
under different body-biasing conditions can then be determined in the following manner,
V T = V T0 + γ (

– 2φ F + V SB –

– 2φ F )

(3
...
Observe that the threshold voltage has a positive value for a typical
NMOS device, while it is negative for a normal PMOS transistor
...
fm Page 60 Monday, September 6, 1999 1:50 PM

60

THE DEVICES

The effect of the well bias on the
threshold voltage of an NMOS
transistor is plotted in for typical
values of |–2φF| = 0
...
4 V0
...
A negative bias on the
well or substrate causes the
threshold to increase from 0
...
85 V
...
6 V
in an NMOS
...


0
...
85
0
...
75

0
...
7

0
...
55
0
...
45
0
...
5

-2

-1
...
14

-0
...


Example 3
...
4 V, while the body-effect coefficient
equals -0
...
Compute the threshold voltage forVSB = -2
...
2φF = 0
...

Using Eq
...
19), we obtain VT(-2
...
4 - 0
...
5+0
...
5 - 0
...
5) V = -0
...
The voltage difference causes a current ID to flow from drain to source (Figure
3
...
Using a simple analysis, a first-order expression of the current as a function of GS
V
and VDS can be obtained
...
15 NMOS transistor with bias
voltages
...
Under the assumption that this voltage exceeds the threshold
voltage all along the channel, the induced channel charge per unit area at point can be
x
computed
...
fm Page 61 Monday, September 6, 1999 1:50 PM

Section 3
...
20)

Cox stands for the capacitance per unit area presented by the gate oxide, and equals
ε ox
C ox = -----t ox

(3
...
97 × εo = 3
...
The latter which is 10 nm (= 100 Å) or smaller for contemporary processes
...

µm
The current is given as the product of the drift velocity of the carriers n and the
υ
available charge
...
W is the width of the channel in a direction perpendicular to the current flow
...
22)

The electron velocity is related to the electric field through a parameter called the
mobility
µn (expressed in m2/V⋅s)
...
In general, an empirical value is used
...
23)

Combining Eq
...
20) − Eq
...
23) yields
I D dx = µ n C ox W ( VGS – V – V T )dV

(3
...

V DS 2
V DS 2
W
- = k n ( V GS – V T )V DS – --------I D = k'n ---- ( V GS – V T )V DS – -------2
2
L

(3
...
26)

The product of the process transconductance k'n and the (W/L) ratio of an (NMOS) transistor is called the gain factor k n of the device
...
(3
...
The operation region where Eq
...
25) holds is hence called the resistive or linear
region
...

NOTICE: The W and L parameters in Eq
...
25) represent the effective channel width
and length of the transistor
...
In the remainder of the text, W and L will
(

chapter3
...
The following expressions related the two parameters, with∆W and ∆L
parameters of the manufacturing process:
W = Wd – ∆W
L = L d – ∆L

(3
...
This happens
when VGS − V(x) < VT
...
This is illustrated in Figure 3
...
16



VGS - VT

+

n+

NMOS transistor under pinch-off conditions
...
No channel exists in the vicinity of the drain region
...


(3
...
(3
...
The voltage difference over the induced channel (from the pinch-off point
to the source) remains fixed at VGS − VT, and consequently, the current remains constant
(or saturates)
...
(3
...
It is worth observing that, to a first agree, the current is no longer a function
of VDS
...

k' n W
I D = ----- ---- ( V GS – V T ) 2
- 2 L

(3
...
fm Page 63 Monday, September 6, 1999 1:50 PM

Section 3
...
This not entirely correct
...
As can be observed from Eq
...
29), the current increases when the length
factor L is decreased
...
(3
...

I D = I D ′ ( 1 + λV DS )

(3
...
Analytical expressions for λ have proven to be complex and
inaccurate
...
In shorter transistors,
the drain-junction depletion region presents a larger fraction of the channel, and the channel-modulation effect is more pronounced
...

Velocity Saturation
The behavior of transistors with very short channel lengths (called
short-channel devices)
deviates considerably from the resistive and saturated models, presented in the previous
paragraphs
...
Eq
...
23)
states that the velocity of the carriers is proportional to the electrical field, independent of
the value of that field
...
However, at high
field strengths, the carriers fail to follow this linear model
...
This is illustrated in Figure
3
...


υn (m/s)

υsat = 105
Constant velocity
Constant mobility (slope = µ)
Figure 3
...
5

Velocity-saturation effect
...
5
× 106 V/m (or 1
...

This means that in an NMOS device with a channel length of 1
µm, only a couple of volts
between drain and source are needed to reach the saturation point
...
Holes in an-type silicon saturate at the same velocity, although a higher electrical field is needed to achieve saturation
...


chapter3
...
We will illustrate this with a first-order derivation of the device characteristics under velocity-saturating conditions [Ko89]
...
17, can be roughly approximated by the following expression:
µn ξ
υ = -------------------- for ξ ≤ ξ c
1 + ξ ⁄ ξc
= υ sat

(3
...
Re-evaluξ
tion of Eq
...
20) and Eq
...
22) in light of revised velocity formula leads to a modified
expression of the drain current in the resistive region:
V DS 2
W
I D = κ(V DS)µ n C ox ---- ( V GS – V T )V DS – -------2
L

(3
...
33)

with

κ is a measure of the degree of velocity saturation, sinceVDS/L can be interpreted as the
average field in the channel
...
(3
...
For short-channel devices,κ is smaller than 1, which
means that the delivered current is smaller than what would be normally expected
...

The saturation drain voltage VDSAT can be calculated by equating the current at the drain to
the current given by Eq
...
32) for VDS = VDSAT
...
(3
...

υ
I DSAT = υ sat C ox W ( VGT – V DSAT )
V DSAT
W
= κ(V DSAT )µ n C ox ---- V GT V DSAT – -------------2
L

2

(3
...
After some algebra, we obtain
V DSAT = κ(V GT )V GT

(3
...
This leads to some interesting observations:
• For a short-channel device and for large enough values ofVGT, κ(VGT) is substantially smaller than 1, hence VDSAT < VGT
...
Short-channel devices therefore experience an extended saturation

chapter3
...
3

The MOS(FET) Transistor

65

ID
Long-channel device
VGS = VDD
Short-channel device

VDSAT

Figure 3
...

VGS - V T

VDS

region, and tend to operate more often in saturation conditions than their long-channel counterparts, as is illustrated in Figure 3
...

• The saturation current IDSAT displays a linear dependence with respect to the gatesource voltage VGS, which is in contrast with the squared dependence in the longchannel device
...
On the other hand, reducing the operating voltage does not
have such a significant effect in submicron devices as it would have in a long-channel transistor
...
From a modeling perspective, it appears as though the
effective channel is shortening with increasingVDS, similar in effect to the channel-length
modulation
...

Thus far we have only considered the effects of the tangential field along the channel due to the VDS, when considering velocity-saturation effects
...
This effect, which is calledmobility degradation, reduces the surface
mobility with respect to the bulk mobility
...
(3
...
36)

with µn0 the bulk mobility and η an empirical parameter
...

Readers interested in a more in-depth perspective on the short-channel effects in
MOS transistors are referred to the excellent reference works on this topic, such as
[Ko89]
...
(3
...
(3
...

A substantially simpler model can be obtained by making two assumptions:

chapter3
...
The velocity saturates abruptly at ξc, and is approximated by the following expression:
υ = µn ξ
= υ sat = µ n ξ c

for ξ ≤ ξ c

(3
...
The drain-source voltage VDSAT at which the critical electrical field is reached and
velocity saturation comes into play is constant and is approximated by Eq
...
38)
...
(3
...

Lυ sat
V DSAT = Lξ c = -----------µn

(3
...
OnceVDSAT is reached, the current abruptly saturates
...
(3
...

I DSAT = I D ( V DS = V DSAT )
2

V DSAT
W
= µ n C ox ----  ( V GS – V T )V DSAT – --------------
2 
L

(3
...
The simplified velocity model causes
substantial deviations in the transition zone between linear and velocity-saturated regions
...
Most importantly, the equations are coherent with the familiar long-channel equations, and provide the digital designer with a much needed tool for intuitive understanding
and interpretation
...
Figure
3
...
One would hence expect both devices to display identical
IV characteristics, The main difference however is that the first device has a long channel
length (Ld = 10 µm), while the second transistor is a short channel device Ld = 0
...

Consider first the long-channel device
...
The transi-

chapter3
...
3

The MOS(FET) Transistor

67
-4

-4

x 10

2
...
5 V

VDS = VGS-VT

VGS = 2
...
0 V
3

2

V GS = 1
...
0 V
1
...
5 V

1

0
...
0 V

V GS = 1
...
5

1

1
...
5

0

0
...
5

2

2
...
25 µm)

Figure 3
...
25µm CMOS technology
...
5

tion between both regions is delineated by the VDS = VGS - VT curve
...
The linear
dependence of the saturation current with respect to VGS is apparent in the short-channel
device of b
...
This results in a substantial drop in current drive for high voltage levels
...
5 V, VDS = 2
...

-4

6

-4

x 10

2
...
5
3

1
2

quadratic
0
...
5

1

1
...
5

0
0

0
...
5

2

2
...
25 µm)

Figure 3
...
25 µm CMOS
technology)
...
5 for both transistors and VDS = 2
...


chapter3
...
20)
...

All the derived equations hold for the PMOS transistor as well
...

is illustrated in Figure 3
...
25 µm CMOS process
...
Interesting to observe is also that the effects of
velocity saturation are less pronounced than in the CMOS devices
...

-4

0

x 10

VGS = -1
...
2

VGS = -1
...
4

VGS = -2
...
6

-0
...
5

VGS = -2
...
5

-1

-0
...
21 I-V characteristics of (Wd=0
...
25 µm) PMOS transistor in 0
...
Due to the smaller mobility, the maximum
current is only 42% of what is achieved by a similar
NMOS transistor
...
20 reveals that the current does not
drop abruptly to 0 at VGS = VT
...
This effect is called
subthreshold or weak-inversion conduction
...
The transition from the on- to the off-condition is thus not abrupt, but gradual
...
20b on a logarithmic scale as shown in Figure 3
...
This confirms that the current
does not drop to zero immediately for VGS < VT, but actually decays in an exponential
2
fashion, similar to the operation of a bipolar transistor
...
The current in this region can be approximated by the expression
2
Discussion of the operation of bipolar transistors is out of the scope of this textbook
...


chapter3
...
3

The MOS(FET) Transistor

69

-2

10

-4

10

Linear region
Quadratic region

-6

ID (A)

10

-8

10

-10

10

-12

10

0

Subthreshold exponential region
VT
0
...
5

2

2
...
22 ID current versus VGS
(on logarithmic scale), showing the
exponential characteristic of the
subthreshold region
...
40)

where IS and n are empirical parameters, with n ≥ 1 and typically ranging around 1
...

In most digital applications, the presence of subthreshold current is undesirable as it
detracts from the ideal switch-like behavior that we like to assume for the MOS transistor
...
The (inverse) rate of decline of the current with respect toVGS below VT
hence is a quality measure of a device
...
From Eq
...
40), we find
S = n  kT ln ( 10 )
 -----
q

(3
...
For an ideal transistor with the sharpest possible rolloff, n = 1 and (kT/q)ln(10) evaluates to 60 mV/decade at room temperature, which means
that the subthreshold current drops by a factor of 10 for a reduction inVGS of 60 mV
...
5)
...
The value ofn is determined by the intrinsic
device topology and structure
...

Subthreshold current has some important repercussions
...
This is especially
important in the so-called dynamic circuits, which rely on the storage of charge on a
capacitor and whose operation can be severely degraded by subthreshold leakage
...


chapter3
...
6 Subthreshold Slope
For the example of Figure 3
...
5 mV/decade is observed (between 0
...
4
V)
...
49
...
Its behavior is heavily non-linear and is influenced by a large number of secondorder effects
...

While excellent from an accuracy perspective, these models fail in providing a designer
with an intuitive insight in the behavior of a circuit and its dominant design parameters
...
A
designer who misses a clear vision on what drives and governs the circuit operation by
necessity resorts on a lengthy trial by error optimization process, that most often leads to
an inferior solution
...
It turns out that the first-order expressions,
derived earlier in the chapter, can be combined into a single expression that meets these
goals
...
23), the value
of which is given defined in the Figure
...
(3
...
(3
...
(3
...

I D = 0 for V GT ≤ 0
2

V min
W
I D = k′ ----  V GT V min – ---------- ( 1 + λV DS ) for V GT ≥ 0
2 
L

G
ID
S

D
B

with V min = min ( V GT, V DS V DSAT ),
,
V GT = V GS – V T ,
and V T = V T0 + γ (

Figure 3
...


Besides being a function of the voltages at the four terminals of the transistor, the
model employs a set of five parameters:VTO, γ, VDSAT, k’ and λ
...
The complexity of the device makes this a precarious task
...
More significantly, the model should match the best in

chapter3
...
3

The MOS(FET) Transistor

71

the regions that matter the most
...

V
The performance of an MOS digital circuit is primarily determined by the maximum
available current (i
...
, the current obtained for VGS = VDS = supply voltage)
...

Example 3
...
25 µm CMOS Process3
Based on the simulated ID-VDS and ID-VGS plots of a (Wd = 0
...
25 µm) transistor,
implemented in our generic 0
...
19, Figure 3
...
5 V, VGS = 2
...
5 V being the typical supply voltage for this process
...
24 for the NMOS transistor, and compared to the simulated values
...
5

x 10

2

D

I (A)

1
...
24 Correspondence between simple model
(solid line) and SPICE simulation (dotted) for minimumsize NMOS transistor (Wd=0
...
25 µm)
...


0
...
5

1

1
...
5

(V)

good correspondence can be observed with the exception of the transition region between
resistive and velocity-saturation
...
(3
...
It
demonstrates that our model, while simple, manages to give a fair indication of the overall
behavior of the device
...
2 tabulates the obtained parameter values for the minimum-sized NMOS and a similarly sized PMOS device in our generic 0
...
These values will be used as
generic model-parameters in later chapters
...
2

Parameters for manual model of generic 0
...

VT0 (V)

γ (V0
...
43

0
...
63

115 × 10−6

0
...
4

-0
...
1

3

A MATLAB implementation of the model is available on the web-site of the textbook
...
fm Page 72 Monday, September 6, 1999 1:50 PM

72

THE DEVICES

Chapter 3

A word of caution — The model presented here is derived from the characteristics
of a single device with a minimum channel-length and width
...
Fortunately, digital circuits typically use only minimum-length devices as
these lead to the smallest implementation area
...
It is however advisable to use a different set of model parameters for
devices with dramatically different size- and shape-factors
...
We therefore introduce an even
more simplified model that has the advantage of being linear and straightforward
...

R
VGS ≥ VT

Ron
S

D
Figure 3
...


The main problem with this model is that Ron is still time-variant, non-linear and
depending upon the operation point of the transistor
...
A reasonable
approach in that respect is to use the average value of the resistance over the operation
region of interest, or even simpler, the average value of the resistances at the end-points of
the transition
...


R eq = averaget = t

1 …t 2

1
( R on ( t ) ) = ------------t2 – t1



t2

t2

V (t)
1
R on (t) dt = ------------- --------------- dt
- DS t2 – t 1 I D( t )
t1


t1

(3
...
8 Equivalent resistance when (dis)charging a capacitor
One of the most common scenario’ in contemporary digital circuits is the discharging of a
s
capacitor from VDD to GND through an NMOS transistor with its gate voltage set toVDD, or
vice-versa the charging of the capacitor toVDD through a PMOS with its gate at GND
...
Assuming that the supply voltage is substantially larger than the velocity-saturation voltage DSAT of
V
the transistor, it is fair to state that the transistor stays in velocity saturation for the entire

chapter3
...
3

The MOS(FET) Transistor

73

duration of the transition
...

ID

VDS (VDD→VDD/2)
VDD

VGS = VDD
Rmid

ID

R0
VDS
VDD/2

(a) schematic

VDD

(b) trajectory traversed on ID-VDS curve
...
26 Discharging a capacitor through an NMOS transistor: Schematic (a) and I-V trajectory (b)
...

V

With the aid of Eq
...
42) and Eq
...

V DD ⁄ 2

R eq

1
= -------------------– V DD ⁄ 2



V DD

V
3 V DD
7
---------------------------------- dV ≈ -- ------------  1 – -- λV DD

I DSAT ( 1 + λV )
4 I DSAT 
9

(3
...
44)

A number of conclusions are worth drawing from the above expressions:


The resistance is inversely proportional to the ( /L) ratio of the device
...




For VDD >> VT + VDSAT /2, the resistance becomes virtually independent of the supply voltage
...
Only a minor improvement in resistance, attributable to the
channel-length modulation, can be observed when raising the supply voltage
...


chapter3
...
5

1

1
...
5

(V)

Figure 3
...
25 µm CMOS process as a
function of VDD
(VGS = VDD, VDS = VDD →VDD/2)
...
3 enumerates the equivalent resistances obtained by simulation of our generic 0
...
These values will come in handy when analyzing the performance of CMOS
gates in later chapters
...
3 Equivalent resistance Req (W/L= 1) of NMOS and PMOS transistors in 0
...
For larger devices, divide Req by W/L
...
5

2

2
...
3
...
A profound understanding of the nature and the behavior of these intrinsic capacitances is
essential for the designer of high-quality digital integrated circuits
...
Aside from the MOS structure capacitances, all capacitors are nonlinear and vary with the applied voltage, which makes their
analysis hard
...


chapter3
...
3

The MOS(FET) Transistor

75

MOS Structure Capacitances
The gate of the MOS transistor is isolated from the conducting channel by the gate oxide
that has a capacitance per unit area equal toCox = ε ox / tox
...
The total value of this capacitance is called thegate capacitance Cg and can be
decomposed into two elements, each with a different behavior
...
Another part is
solely due to the topological structure of the transistor
...

Consider the transistor structure of Figure 3
...
Ideally, the source and drain diffusion should end right at the edge of the gate oxide
...
Hence, the
effective channel of the transistor L becomes shorter than the drawn length Ld (or the
length the transistor was originally designed for) by a factor of = 2xd
...
This capacitance is strictly linear and has a fixed value
Polysilicon gate

Drain

Source

W

n+

xd

n+

xd
Ld

Gate-bulk
overlap

(a) Top view
Gate oxide

tox
n+

L

(b) Cross section

n+
Figure 3
...


C GSO = C GDO = C ox x d W = Co W

(3
...

Channel Capacitance
Perhaps the most significant MOS parasitic circuit element, the gate-to-channel capacitance CGC varies in both magnitude and in its division into three components GCS, CGCD,
C
and CGCB (being the gate-to-source, gate-to-drain, and gate-to-body capacitances, respectively), depending upon the operation region and terminal voltages
...
29
...
fm Page 76 Monday, September 6, 1999 1:50 PM

76

THE DEVICES

Chapter 3

cut-off (a), no channel exists, and the total capacitanceCGC appears between gate and
body
...
Consequently,CGCB = 0 as the body electrode is shielded from
the gate by the channel
...
Finally, in the saturation mode (c), the channel is pinched off
...
All the capacitance hence is between gate and source
...
29 The gate-to-channel capacitance and how the operation region influences is distribution over the three other
device terminals
...
The first plot (Figure 3
...
For
VGS = 0, the transistor is off, no channel is present and the total capacitance, equal to
WLCox, appears between gate and body
...
This seemingly causes the thickness of the gate dielectric to increase,
which means a reduction in capacitance
...
With VDS = 0, the device operates in the resistive mode and
the capacitance divides equally between source and drain, or GCS = CGCD = WLCox/2
...
A
designer looking for a well-behaved linear capacitance should avoid operation in this
region
...
30

3

WLCox
CGCD

2

VGS
VT
(a) CGC as a function of VGS (with VDS=0)

2WLCox

CGCS

0

VDS/(VGS-VT)

1

(b) CGC as a function of the degree of saturation

Distribution of the gate-channel capacitance as a function of GS and VDS (from [Dally98])
...
As illustrated in Figure 3
...
fm Page 77 Monday, September 6, 1999 1:50 PM

Section 3
...
This also means that the total gate capacitance is getting smaller with an increased
level of saturation
...
To make a first-order analysis possible, we
will use a simplified model with a constant capacitance value in each region of operation
in the remainder of the text
...
4
...
4

Average distribution of channel capacitance of MOS transistor for different operation regions
...
9 Using a circuit simulator to extract capacitance
Determining the value of the parasitic capacitances of an MOS transistor for a given operation
mode is a labor-intensive task, and requires the knowledge of a number of technology parameters that are often not explicitly available
...
Assume we would
like to know the value of the total gate capacitance of a transistor in a given technology as a
function of VGS (for VDS = 0)
...
31a will give us exactly
this information
...


dV
C G(V GS ) = I ⁄  GS
 dt 
A transient simulation gives us VGS as a function of time, which can be translated into
the capacitance with the aid of some simple mathematical manipulations
...
31b, which plots the simulated gate capacitance of a minimum size 0
...
The graphs clearly shows the drop of the capacitance when VGS approaches VT and the discontinuity atVT, predicted in Figure 3
...


Junction Capacitances
A final capacitive component is contributed by the reverse-biased source-body and drainbody pn-junctions
...
To understand the components of the junction
capacitance (often called the diffusion capacitance), we must look at the source (drain)

chapter3
...
31 Simulating the gate capacitance of an MOS
transistor; (a) circuit configuration used for the analysis, (b)
resulting capacitance plot for minimum-size NMOS transistor
in 0
...


7
6
5

4
3

2
-2

-1
...
5

0

V

GS

0
...
5

2

(V)

(b)

region and its surroundings
...
32, shows that the
junction consists of two components:
Channel-stop implant
NA+
Side wall
Source
ND

W

Bottom

xj

Side wall
Channel
LS

Substrate NA

Figure 3
...


• The bottom-plate junction, which is formed by the source region (with dopingND)
and the substrate with doping NA
...
(3
...
As the bottom-plate junction is typically of the abrupt type, the
grading coefficient m approaches 0
...

• The side-wall junction, formed by the source region with doping D and the p+ chanN
nel-stop implant with doping level NA+
...
The
side-wall junction is typically graded, and its grading coefficient varies from 0
...
5
...
Notice that no side-wall
capacitance is counted for the fourth side of the source region, as this represents the
conductive channel
...
An expression for the total
jsw
junction capacitance can then be derived,

chapter3
...
3

The MOS(FET) Transistor

79

C diff = C bottom + C sw = C j × AREA + C jsw × PERIMETER
= C j L S W + C jsw ( 2L S + W )

(3
...
(3
...

Problem 3
...
31)
...
33
...

G
CGS

CGD
D

S
CGB

CSB

CDB
Figure 3
...


B

CGS = CGCS + CGSO; CGD = CGCD + CGDO; CGB = CGCB
CSB = CSdiff; CDB = CDdiff

(3
...

Example 3
...
24 µm, W =
0
...
625 µm, CO = 3 × 10–10 F/m, Cj0 = 2 × 10–3 F/m2, Cjsw0 = 2
...
Determine the zero-bias value of all relevant capacitances
...
7 fF/µm 2
...
49 fF
...
105 fF
...
7 fF
...
Due to the doping conditions and the small area, this component can virtually always be ignored in
a first-order analysis
...

JSWG

chapter3
...
The
former is equal to Cj0 LDW = 0
...
44 fF
...
89 fF
...
This is a worst-case
condition, however
...
Also, clever design can help to reduce the value ofLD (LS)
...


Design Data — MOS Transistor Capacitances
Table 3
...
25 µm CMOS process
...
5 Capacitance parameters of NMOS and PMOS transistors in 0
...

Cox
(fF/µm2)

CO
(fF/µm)

Cj
(fF/µm2)

mj

φb
(V)

Cjsw
(fF/µm)

mjsw

φbsw
(V)

NMOS

6

0
...
5

0
...
28

0
...
9

PMOS

6

0
...
9

0
...
9

0
...
32

0
...
34a
...
The resistance of the drain (source) region can be expressed as
L S, D
R S, D = ---------- R
W

+ RC

(3
...
34b)
...
Observe that the resistance of a square
of material is constant, independent of its size (see also Chapter 4)
...
Keeping its value as small as possible is thus
an important design goal for both the device and the circuit engineer
...
This process is calledsilicidation and effec-

chapter3
...
3

The MOS(FET) Transistor

81

tively reduces the sheet resistance to values in the range from 1 to 4
Ω/o
...
(3
...

With a process that includes silicidation and proper attention to layout, parasitic resistance
is not important
...

3
...
4

The Actual MOS Transistor— Some Secondary Effects

The operation of a contemporary transistor may show some important deviations from the
model we have presented so far
...
At that point, the
assumption that the operation of a transistor is adequately described by a one-dimensional
model, where it is assumed that all current flows on the surface of the silicon and the electrical fields are oriented along that plane, is not longer valid
...
An example of such was already given inSection
3
...
2 when we discussed the mobility degradation
...
One word of warning, though
...
It
is therefore advisable to analyze and design MOS circuits first using the ideal model
...

Polysilicon gate
LD

G

Drain
contact

D

S
RS

W

VGS,eff

RD

Drain
(a) Modeling the series resistance
Figure 3
...


(b) Parameters of the series resistance

Series drain and source resistance
...
fm Page 82 Monday, September 6, 1999 1:50 PM

82

THE DEVICES

Chapter 3

VT

VT
Long-channel threshold

L
(a) Threshold as a function of the
length (for low VDS)
Figure 3
...


Threshold Variations
Eq
...
19) states that the threshold voltage is only a function of the manufacturing technology and the applied body biasVSB
...
As the device dimensions are reduced, this
model becomes inaccurate, and the threshold potential becomes a function of W, and
L,
VDS
...

In the traditional derivation of the VTO, for instance, it is assumed that the channel
depletion region is solely due to the applied gate voltage and that all depletion charge
beneath the gate originates from the MOS field effects
...
Since a part of the region below the gate is already
depleted (by the source and drain fields), a smaller threshold voltage suffices to cause
strong inversion
...
35a)
...
Consequently, the threshold
decreases with increasing VDS
...
35b)
...
The sharp increase in
current that results from this effect, which is calledpunch-through, may cause permanent
damage to the device and should be avoided
...

Since the majority of the transistors in a digital circuit are designed at the minimum
channel length, the variation of the threshold voltage as a function of the length is almost
uniform over the complete design, and is therefore not much of an issue except for the
increased sub-threshold leakage currents
...
This is, for instance, a problem in dynamic memories,
where the leakage current of a cell (being the subthreshold current of the access transistor)
becomes a function of the voltage on the data-line, which depends upon the applied data
patterns
...


chapter3
...
3

The MOS(FET) Transistor

83

Worth mentioning is that the threshold of the MOS transistor is also subject to
narrow-channel effects
...
The gate
voltage must support this extra depletion charge to establish a conducting channel
...
For small geometry transistors,
with small values of L and W, the effects of short- and narrow channels may tend to cancel
each other out
...
This is the result of the hot-carrier effect [Hu92]
...
The resulting increase in the electrical field
strength causes an increasing velocity of the electrons, which can leave the silicon and
tunnel into the gate oxide upon reaching a high-enough energy level
...
For an electron to become hot, an
4
electrical field of at least 10 V/cm is necessary
...
The hot-electron phenomenon can lead to a
long-term reliability problem, where a circuit might degrade or fail after being in use for a
while
...
36, which shows the degradation in the characterI-V
istics of an NMOS transistor after it has been subjected to extensive operation
...
The reduced supply voltage that is typical for deep sub-micron technologies can in part be attributed to the necessity to keep hotcarrier effects under control
...
36 Hot-carrier effects cause the I-V characteristics of an NMOS transistor to degrade from
extensive usage (from [McGaughy98])
...
fm Page 84 Monday, September 6, 1999 1:50 PM

84

THE DEVICES

Chapter 3

CMOS Latchup
The MOS technology contains a number of intrinsic bipolar transistors
...
Triggering these thyristor-like
devices leads to a shorting of the VDD and VSS lines, usually resulting in a destruction of
the chip, or at best a system failure that can only be resolved by power-down
...
37a
...
A circuit
equivalent is shown in Figure 3
...
When one of the two bipolar transistors gets forward
biased (e
...
, due to current flowing through the well, or substrate), it feeds the base of the
other transistor
...

VDD
VDD
Rnwell
p+

n+

p+

n+

n-well

p+

Rnwell

Rpsubs

n-source
p-substrate

(a) Origin of latchup
Figure 3
...


From the above analysis the message to the designer is clear— to avoid latchup, the
resistances Rnwell and Rpsubs should be minimized
...
Devices carrying a lot of current (such as transistors in the I/O
drivers) should be surrounded by guard rings
...
For an extensive discussion on how to avoid latchup, please refer to
[Weste93]
...
In recent
years, process innovations and improved design techniques have all but eliminated the
risks for latchup
...
3
...
In general, more accuracy also means more complexity and,
hence, an increased run time
...
fm Page 85 Monday, September 6, 1999 1:50 PM

Section 3
...

SPICE Models
SPICE has three built-in MOSFET models, selected by the LEVEL parameter in the
model card
...
They should only be used for first-order analysis, and we
therefore limit ourselves to a short discussion of their main properties
...
It
does not handle short-channel effects
...
It handles effects such as velocity saturation, mobility degradation, and drain-induced barrier lowering
...

• LEVEL 3 is a semi-empirical model
...
It
works quite well for channel lengths down to 1µm
...
A
complete description of all those would take the remainder of this book, which is, obviously, not the goal
...
g
...

The BSIM3V3 SPICE Model
The confusing situation of having to use a different model for each manufacturer has fortunately been partially resolved by the adoption of the BSIM3v3 model as an industrywide standard for the modeling of deep-submicron MOSFET transistors
...
Its popularity and accuracy make it the natural choice for all the simulations presented in this book
...
Fortunately, understanding the intricacies of all these parameters is not a requirement for the
digital designer
...
6)
...
Providing a single set of parameters
that is acceptable over all possible device dimensions is deemed to be next to impossible
...
fm Page 86 Monday, September 6, 1999 1:50 PM

86

THE DEVICES

Chapter 3

LMIN, LMAX, WMIN, and WMAX (called a bin)
...

Table 3
...

Parameter Category

Description

Control

Selection of level and models for mobility, capacitance, and noise
LEVEL, MOBMOD, CAPMOD

DC

Parameters for threshold and current calculations
VTH0, K1, U0, VSAT, RSH,

AC & Capacitance

Parameters for capacitance computations
CGS(D)O, CJ, MJ, CJSW, MJSW

dW and dL

Derivation of effective channel length and width

Process

Process parameters such as oxide thickness and doping concentrations
TOX, XJ, GAMMA1, NCH, NSUB

Temperature

Nominal temperature and temperature coefficients for various device parameters
TNOM

Bin

Bounds on device dimensions for which model is valid
LMIN, LMAX, WMIN, WMAX

Flicker Noise

Noise model parameters

We refer the interested reader to the BSIM3v3 documentation provided on the website of the textbook (REFERENCE) for a complete description of the model parameters
and equations
...
25 µm CMOS process can be
found at the same location
...
7
...
SPICE assumes
default values (which are often zero!) for the missing factors
...
For instance, you must
accurately specify the area and the perimeter of the source and drain regions of the devices
when performing a performance analysis
...
Similarly, it is often necessary to painstakingly define the value of the drain and
source resistance
...


chapter3
...
4

A Word on Process Variations

Table 3
...

Parameter Name

Symbol

SPICE Name

Units

Default Value

Drawn Length

L

L

m



Effective Width

W

W

m



2

Source Area

AREA

AS

m

Drain Area

AREA

AD

m2

0

Source Perimeter

PERIM

PS

m

0

Drain Perimeter

PERIM

PD

m

0

Squares of Source Diffusion

NRS



1

Squares of Drain Diffusion

NRD



1

Example 3
...
Transistor M1 is an NMOS device of model-type (and bin)
nmos
...
Its gate length is the minimum allowed in this technology (0
...
The ‘ ’
+
character at the start of line 2 indicates that this line is a continuation of the previous one
...
1, connected between nodes nvout, nvin, nvdd, and
nvdd (D, G, S, and B, respectively), is three times wider, which reduces the series resistance,
but increases the parasitic diffusion capacitances as the area and perimeter of the drain and
source regions go up
...
lib line refers to the file that contains the transistor models
...
1 W=0
...
25U
+AD=0
...
625U AS=0
...
625U NRS=1 NRD=1
M2 nvout nvin nvdd nvdd pmos
...
125U L=0
...
7P PD=2
...
7P PS=2
...
33 NRD=0
...
lib 'c:\Design\Models\cmos025
...
4 A Word on Process Variations
The preceding discussions have assumed that a device is adequately modeled by a single
set of parameters
...
This observed random distribution between supposedly identical devices is primarily the result of two factors:
1
...
These result in diverging values for
sheet resistances, and transistor parameters such as the threshold voltage
...
fm Page 88 Monday, September 6, 1999 1:50 PM

88

THE DEVICES

Chapter 3

2
...
This causes deviations in the W/L) ratios of
(
MOS transistors and the widths of interconnect wires
...
For instance,
variations in the length of an MOS transistor are unrelated to variations in the threshold
voltage as both are set by different process steps
...

• The threshold voltage VT can vary for numerous reasons: changes in oxide thickness, substrate, poly and implant impurity levels, and the surface charge
...
Where in the
past thresholds could vary by as much as 50%, state-of-the-art digital processes
manage to control the thresholds to within 25-50 mV
...
Variations can also occur in the mobility but to a lesser degree
...
These are mainly caused by the lithographic process
...

The measurable impact of the process variations may be a substantial deviation of
the circuit behavior from the nominal or expected response, and this could be in either
positive or negative directions
...
Assume, for instance, that you are supposed to design a microprocessor running
at a clock frequency of 500 MHz
...
One way to achieve that goal is to
design the circuit assuming worst-case values for all possible device parameters
...

To help the designer make a decision on how much margin to provide, the device
manufacturer commonly provides fast and slow device models in addition to the nominal
ones
...

Example 3
...
25µm CMOS process
...

Assume initially that VGS = VDS = 2
...
From earlier simulations, we know that this
produces a drain current of 220 µA
...
Simulations produce the following data:
Fast: Id = 265 µA: +20%
Slow: Id = 182 µA: -17%
Let us now proceed one step further
...
For instance, the voltage delivered by a battery can drop off substantially

chapter3
...
4

A Word on Process Variations

89

towards the end of its lifetime
...

Fast + Vdd = 2
...
25 V: Id = 155 µA: -30%
The current levels and the associated circuit performance can thus vary by almost 100%
between the extreme cases
...
This translates into a
severe area penalty
...

The probability that all parameters assume their worst-case values simultaneously is very
low, and most designs will display a performance centered around the nominal design
...
g
...

Specialized design tools to help meet this goal are available
...
The result is a distribution plot of design parameters
(such as the speed or the sensitivity to noise) that can help to determine if the nominal
design is economically viable
...
38
...

2
...
10

Delay (ns)

Delay (ns)

1
...
70

1
...
10

1
...
70

1
...
30

1
...
50

Leff (in mm)

1
...
50
–0
...
80

–0
...
60

–0
...
38 Distribution plots of speed of adder circuit as a function of varying device parameters, as obtained
by a Monte Carlo analysis
...


One important conclusion from the above discussion is that SPICE simulations
should be treated with care
...
Actual implementations are bound
to differ from the simulation results, and for reasons other than imperfections in the mod-

chapter3
...
Be furthermore aware that temperature variations on the die can present
another source for parameter deviations
...


3
...
As already argued in the introduction, applications that were considered implausible yesterday are already forgotten today
...
To illustrate this point, we have plotted in
Figure 3
...
We observe a reduction rate of approximately 13% per
year, halving every 5 years
...

2

Minimum Feature Size (micron)

10

1

10

0

10

Figure 3
...
Dots represent
observed or projected (2000 and beyond) values
...


-1

10

-2

10
1960

1970

1980

1990

2000

2010

Year

A pertinent question is how this continued reduction in feature size influences the
operating characteristics and properties of the MOS transistor, and indirectly the critical
digital design metrics such as switching frequency and power dissipation
...
In
addition to the minimum device dimension, we have to consider the supply voltage as a
second independent variable in such a study
...

Three different models are studied in Table 3
...
To make the results tractable, it is
assumed that all device dimensions scale by the same factorS (with S > 1 for a reduction
in size)
...
Similarly, we assume that all voltages, including the supply voltage and

chapter3
...
5

Perspective: Technology Scaling

91

the threshold voltages, scale by a same ratioU
...
Observe that this analysis only
considers short-channel devices with a linear dependence between control voltage and saturation current (as expressed by Eq
...
39))
...

Table 3
...


Parameter

Relation

Full Scaling

General Scaling

Fixed-Voltage Scaling

W, L, tox

1/S

1/S

1/S

VDD, VT

1/S

1/U

1

NSUB
Area/Device

V/Wdepl

2

WL

2

S
1/S

S /U

S2

2

1/S

2

1/S2

Cox

1/tox

S

S

S

Cgate

CoxWL

1/S

1/S

1/S

kn, kp

CoxW/L

S

S

S

Isat

CoxWV

1/S

1/U

1

Current Density

Isat/Area

S

S2/U

S2

Ron

V/Isat

1

1

1

Intrinsic Delay

RonCgate

1/S

1/S

1/S

P

IsatV

1/S2

1/U2

Power Density

P/Area

1

2

S /U

2

1
S2

Full Scaling (Constant Electrical Field Scaling)
In this ideal model, voltages and dimensions are scaled by the same factor The goal is to
S
...
Keeping the electrical fields constant ensures the physical integrity of the device
and avoids breakdown or other secondary effects
...

The effects of full scaling on the device and circuit parameters are summarized in the third
column of Table 3
...
We use the intrinsic time constant, which is the product of the gate
capacitance and the on-resistance, as a measure for the performance
...
The performance improved is solely due to the reduced capacitance
...
It is assumed that the carrier mobilities are not affected by the scaling
...
The substrate doping Nsub is scaled so that the maximum depletion-layer width is reduced by a
factor S
...
It is furthermore assumed that the delay of the device is mainly determined by the intrinsic
capacitance (the gate capacitance) and that other device capacitances, such as the diffusion
capacitances, scale appropriately
...


chapter3
...
First of all, to keep new devices compatible
with existing components, voltages cannot be scaled arbitrarily
...
As a result, voltages
have not been scaled down along with feature sizes, and designers adhere to well-defined
standards for supply voltages and signal levels
...
40, 5 V was
the de facto standard for all digital components up to the early 1990s, and a
fixed-voltage
scaling model was followed
...
5µm CMOS technology did new standards such
as 3
...
5 V make an inroad
...
The reason for this change in operation model can partially be
5
4
...
5
3
2
...
5
1
0
...
40 Evolution of min and max supplyvoltage in digital integrated circuits as a function
of feature size
...
15 micron and
below are projected
...
8
...
The gain of an increased current is simply offset by the higher voltage level, and only hurts the power dissipation
...
(3
...

Keeping the voltage constant under these circumstances gives a distinct performance
advantage, as it causes a net reduction in on-resistance
...

Problem 3
...


chapter3
...
5

Perspective: Technology Scaling

93

Reconstruct Table 3
...

(3
...


WARNING: The picture painted in the previous section represents a first-order model
...
This is apparent in Figure 3
...
3, which show a reduction
of the equivalent on-resistance with increasing supply voltage — even for the high voltage
range
...

The reader should keep this warning in the back of his mind throughout this scaling
study
...
This implies ignoring second-order effects
such as mobility-degradation, series resistance, etc
...
40 that the supply voltages, while moving downwards, are not
scaling as fast as the technology
...
5 to
µm
0
...
5 V
...

• The scaling potential of the transistor threshold voltage is limited
...
This is aggravated by the large process variation of the value of the threshold, even on the same
wafer
...
This general scaling model is shown in the fourth column of
Table 3
...
Here, device dimensions are scaled by a factorS, while voltages are reduced by
a factor U
...
Note that the general-scaling model offers a performance scenario
identical to the full- and the fixed scaling, while its power dissipation lies between the two
models (for S > U > 1)
...
9 the characteristics of some of the most recent CMOS processes and projections on some future ones
...
As predicted by the
scaling model, the maximum drive current remains approximately constant
...
fm Page 94 Monday, September 6, 1999 1:50 PM

94

THE DEVICES
Table 3
...


Year of Introduction

1997

1999

2001

2003

2006

2009

Channel length (µm)

0
...
18

0
...
13

0
...
07

Gate oxide (nm)

4-5

3-4

2-3

2-3

1
...
5

VDD (V)

1
...
5

1
...
8

1
...
5

1
...
5

0
...
2

0
...
9

VT (V)

0
...
4

0
...
4

0
...
4

NMOS/PMOS IDsat (nA/µm)

600/280

600/280

600/280

600/280

600/280

600/280

From the above, it is reasonable to conclude that both integration density and performance will continue to increase
...
These transistors,
while working along similar concepts as the current MOS devices, look very different
from the structures we are familiar with, and require some substantial
device engineering
...
41 shows a potential transistor structure, the folded channel dualgated transistor, which has proven to be operational up to very small channel lengths
...
41

Folded dual-gated transistor with 25 nm channel length [Hu99]
...
Whether this will actually happen is an
s
open question
...
A first doubt is if such a part can be
manufactured in an economical way
...
Design considerations also play a role
...
The
growing role of interconnect parasitics might put an upper bound on performance
...
All in
all, it is obvious that the design of semiconductor circuits still faces an exciting future
...
fm Page 95 Monday, September 6, 1999 1:50 PM

Section 3
...
6 Summary
In this chapter, we have presented a a comprehensive overview of the operation of the
MOSFET transistor, the semiconductor device at the core of virtually all contemporary
digital integrated circuits
...
These models
will be used extensively in later chapters, where we look at the fundamental building
blocks of digital circuits
...

• The static behavior of the junction diode is well described by the ideal diode equation that states that the current is anexponential function of the applied voltage bias
...
This is particularly important as
the omnipresent source-body and drain-body junctions of the MOS transistors all
operate in this mode
...

• The MOS(FET) transistor is a voltage-controlled device, where the controlling gate
terminal is insulated from the conducting channel by a SiO capacitor
...
One of the most
enticing properties of the MOS transistor, which makes it particularly amenable to
digital design, is that it approximates a voltage-controlled switch: when the control
voltage is low, the switch is nonconducting (open); for a high control voltage, a conducting channel is formed, and the switch can be considered closed
...

• The continuing reduction of the device dimensions to the submicron range has introduced some substantial deviations from the traditional long-channel MOS transistor
model
...
Models for this effect as well as other second-order parasitics
have been introduced
...

• The dynamic operation of the MOS transistor is dominated by the
device capacitors
...
The minimization of these
capacitances is the prime requirement in high-performance MOS design
...
It was
observed that these models represent an average behavior and can vary over a single
wafer or die
...
fm Page 96 Monday, September 6, 1999 1:50 PM

96

THE DEVICES

Chapter 3

• The MOS transistor is expected to dominate the digital integrated circuit scene for
the next decade
...
07
micron by the year 2010, and logic circuits integrating more than 1 billion transistors on a die
...
7 To Probe Further
Semiconductor devices have been discussed in numerous books, reprint volumes, tutorials, and journal articles
...
The
books and journals referenced below contain excellent discussions of the semiconductor
devices of interest or refer to specific topics brought up in the course of this chapter
...
Antognetti and G
...
),Semiconductor Device Modeling with SPICE,
McGraw-Hill, 1988
...
Bhanzhaf, Computer Aided Analysis Using PSPICE, 2nd ed
...

[Chen90] J
...

[Gray69] P
...
Searle, Electronic Principles, John Wiley and Sons, 1969
...
Gray and R
...
, John
Wiley and Sons, 1993
...
Haznedar, Digital Microelectronics, Benjamin/Cummings, 1991
...
Hodges and H
...
,
McGraw-Hill, 1988
...
Howe and S
...

[Hu92] C
...
27, no
...

241–246, March 1992
...
Hu, “Future CMOS Scaling and Reliability,” IEEE Proceedings, vol
...
5, May
1993
...
Jensen et al
...
1–61, August 1991
...
Ko, “Approaches to Scaling,” in VLSI Electronics: Microstructure Science, vol
...
1–37, Academic Press, 1989
...
Muller and T
...
, John
Wiley and Sons, 1986
...
Nagel, “SPICE2: a Computer Program to Simulate Semiconductor Circuits,” Memo
ERL-M520, Dept
...
and Computer Science, University of California at Berkeley, 1975
...
Sedra and K
...
, Holt, Rinehart and Winston,
1987
...
Sheu, D
...
Ko, and M
...
SC-22, no
...

558–565, August 1987
...
fm Page 97 Monday, September 6, 1999 1:50 PM

Section 3
...
Sze, Physics of Semiconductor Devices, 2nd ed
...

[Thorpe92] T
...

[Toh88] K
...
Koh, and R
...
23
...
4, pp 950–957, August 1988
...
Tsividis, Operation and Modeling of the MOS Transistor, McGraw-Hill, 1987
...
Yamaguchi et al
...
Electron
...
35, no 8, pp
...

[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
8 Exercises and Design Problems
For all problems, use the device parameters provided in Chapter 3 (XXX) and the inside back book
cover, unless otherwise mentioned
...

1
...
23]
a
...
42
...
7 V, solve for
Don
ID
...
Find ID and VD using the ideal diode equation
...

c
...

d
...

R1 = 2 kW
+
5V


R2 = 2 kW

ID
+
VD


Figure 3
...


3
...


[M, None, 2
...
3] For the circuit in Figure 3
...
3 V
...
65 V,
and m = 0
...
NA = 2
...

a
...

b
...
Find the depletion region width,Wj, of the diode
...
Use the parallel-plate model to find the junction capacitance,Cj
...
Set Vs = 1
...
Again using the parallel-plate model, explain qualitatively whyCj
increases
...
3
...
44 shows NMOS and PMOS devices with drains, source, and gate
ports annotated
...
Verify with SPICE
...
7 V, λ = 0
...
8 V, λ = 0
...
Assume (W/L) = 1
...
NMOS: VGS = 3
...
3 V
...
5 V, VDS = –1
...


2

CONTENTS
1
...
6 Exercises

Chapter 2: The Manufacturing Process (30 pages)
2
...
2
2
...
4
2
...
6
2
...
8

Introduction
The CMOS Manufacturing Process

Design Rules — The Contract between Designer and Process Engineer
Packaging Integrated Circuits
Perspective — Trends in Process Technology
Summary
To Probe Further
Exercises and Design Problems

Design Methodology Insert A: Design Layout and Design Rule Verification (6 pages)
Chapter 3: The Devices (52 pages)
3
...
2 The Diode
3
...
1
A First Glance at the Diode — The Depletion Region
3
...
2
Static Behavior
3
...
3
Dynamic, or Transient, Behavior
3
...
4
The Actual Diode—Secondary Effects
3
...
5
The SPICE Diode Model
3
...
3
...
3
...
3
...
3
...
3
...
4 A Word on Process Variations
3
...
6 Summary
3
...
8 Exercises and Design Problems

Design Methodology Insert B: Device Models and Circuit Simulation (6 pages)
Chapter 4: The Wire (40 pages)
4
...
2 A First Glance
4
...
3
...
3
...
3
...
4 Electrical Wire Models

CONTENTS

3

4
...
6
4
...
8
4
...
4
...
4
...
4
...
4
...
4
...
5
...
5
...
1 Introduction
5
...
3 Evaluating the Robustness of the CMOS Inverter: The Static Behavior
5
...
1
Switching Threshold
5
...
2
Noise Margins
5
...
3
Robustness Revisited
5
...
4
...
4
...
4
...
5 Power, Energy, and Energy-Delay
5
...
1
Dynamic Power Consumption
5
...
2
Static Consumption
5
...
3
Putting It All Together
5
...
4
Analyzing Power Consumption Using SPICE
5
...
7 Summary
5
...
9 Exercises and Design Problems

Chapter 6: Designing Combinational Logic Gates in CMOS (64 pages)
6
...
2 Static CMOS Design
6
...
1
Complementary CMOS
6
...
2
Ratioed Logic
6
...
3
Pass-Transistor Logic
6
...
3
...
3
...
4
6
...
6
6
...
3
...
3
...
1
7
...
3
7
...
5

7
...
7
7
...
9

7
...
11
7
...
13

Introduction
Timing Metrics for Sequential Circuits
Classification of Memory Elements
Static Latches and Registers
7
...
1
The Bistability Principle
7
...
2
SR Flip-Flops
7
...
3
Multiplexer-Based Latches
7
...
4
Master-Slave Based Edge Triggered Register
7
...
5
Non-ideal clock signals
7
...
6
Low-Voltage Static Latches
Dynamic Latches and Registers
7
...
1
Dynamic Transmission-Gate Based Edge-triggred Registers
7
...
2
C2MOS Dynamic Register: A Clock Skew Insensitive Approach
7
...
3
True Single-Phase Clocked Register (TSPCR)
Pulse Registers
Sense-Amplifier Based Registers (Consolidate with 7
...
8
...
Register-Based Pipelines
7
...
2
NORA-CMOS—A Logic Style for Pipelined Structures
Non-Bistable Sequential Circuits
7
...
1
The Schmitt Trigger
7
...
2
Monostable Sequential Circuits
7
...
3
Oscillators
Perspective: Choosing a Clocking Strategy
Summary
To Probe Further
Exercises and Design Problems

Design Methodology Insert D: Timing Analysis and Verification (8-10 pages)

Chapter 8: Dealing with Interconnect (45 pages)
8
...
2 Capacitive Parasitics
8
...
1
Capacitance and Reliability—Cross Talk
8
...
2
Capacitance and Performance in CMOS

CONTENTS

5
8
...
3
...
3
...
3
...
4 Inductive Parasitics
8
...
1
Inductance and Reliability— Voltage Drop
8
...
2
Inductance and Performance—Transmission Line Effects
8
...
6 Chapter Summary
8
...
8 Exercises and Design Problems

Design Methodology Insert E: Interconnect modeling and analysis (6 pages)

PART III: A SYSTEM PERSPECTIVE
Chapter 9:
9
...
2
9
...
4
9
...
6
9
...
8

Designing Complex Digital Integrated Circuits (40 pages)
Introduction
The Standard-cell Design Approach
Array-based Design
Configurable and Reconfigurable Design
Perspective: Facing the Increasing Design Complexity
Summary
To Probe Further
Exercises and Design Problems

Chapter 10: Timing Issues in Digital Circuits (55 pages)
10
...
2 Synchronous systems
10
...
Impact of clock variation on performance
10
...
Clock Distribution Basics
10
...
Performance and Power Optimization in Synchronous Design
10
...
Asynchronous Design
10
...
The Asynchronous-synchronous Interface
10
...
Clock Signal Generation
10
...
10 Summary
10
...
12 Exercises and Design Problems
Chapter 11: Designing Arithmetic Building Blocks (50 pages)
9
...
2 Datapaths in Digital Processor Architectures
9
...
3
...
4

9
...
6
9
...
8
9
...
10
9
...
3
...
3
...
4
...
4
...
4
...
5
...
5
...
1 Introduction
10
...
2
...
2
...
3 The Memory Core
10
...
1 Read-Only Memories
10
...
2 Nonvolatile Read-Write Memories
10
...
3 Read-Write Memories (RAM)
10
...
4
...
4
...
4
...
4
...
5 Memory Reliability and Yield
10
...
1 Signal-To-Noise Ratio
10
...
2 Memory yield
10
...
6
...
6
...
7 Perspective: Semiconductor Memory Trends and Evolutions
10
...
9 To Probe Further
10
...
It addresses the following topics:
- pad design, ESD, guard rings, latchup

CONTENTS

7
- off-chip signaling: termination, current versus voltage mode,
high-speed serial links,

Design Methodology Insert G: Validation and Test of Manufactured Circuits (8 pages)
G
...
2 Design for Testability
G
...
THE FOUNDATIONS
CHAPTER 1: INTRODUCTION
1
...
2

Issues in Digital Integrated Circuit Design

1
...
3
...
3
...
3
...
3
...
4

Summary

1
...
1

Introduction

2
...
2
...
2
...
2
...
2
...
3

Design Rules — The Contract between Designer and Process Engineer

2
...
4
...
4
...
4
...
5

Perspective — Trends in Process Technology
2
...
1
2
...
2

Short-Term Developments
In the Longer Term

2
...
7

To Probe Further

DESIGN METHODOLOGY INSERT A: IC LAYOUT
CHPATER 3: THE DEVICES
3
...
2

The Diode
3
...
1
3
...
2
3
...
3
3
...
4
3
...
5

3
...
3
...
3
...
3
...
3
...
3
...
4

A Word on Process Variations

3
...
6

Summary

3
...
1

Introduction

4
...
3

Interconnect Parameters — Capacitance, Resistance, and Inductance

2

3

4
...
1
4
...
2
4
...
3

4
...
4
...
4
...
4
...
4
...
4
...
5

Table Of Contents

The Ideal Wire
The Lumped Model
The Lumped RC model
The Distributed rc Line
The Transmission Line

SPICE Wire Models
4
...
1
4
...
2

Distributed rc Lines in SPICE
Transmission Line Models in SPICE

4
...
7

Summary

4
...
A CIRCUIT PERSPECTIVE
Chapter 5: THE CMOS INVERTER
5
...
2

The Static CMOS Inverter — An Intuitive Perspective

5
...
3
...
3
...
3
...
4

Performance of CMOS Inverter: The Dynamic Behavior
5
...
1
5
...
2
5
...
3

5
...
5
...
5
...
5
...
5
...
6

Perspective: Technology Scaling and its Impact on the Inverter Metrics

5
...
8

4

To Probe Further

CHAPTER 6: DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
6
...
2

Static CMOS Design
6
...
1
6
...
2
6
...
3

6
...
3
...
3
...
3
...
3
...
4

Complementary CMOS
Ratioed Logic
Pass-Transistor Logic

Dynamic Logic: Basic Principles
Speed and Power Dissipation of Dynamic Logic
Issues in Dynamic Design
Cascading Dynamic Gates

Perspectives
6
...
1
6
...
2

How to Choose a Logic Style?
Designing Logic for Reduced Supply Voltages

6
...
6

To Probe Further

DESIGN METHODOLOGY INSERT C: HOW TO SIMULATE COMPLEX
LOGIC GATES
C
...
2

Representing Data as a Discrete Entity

C
...
4

To Probe Further

DESIGN METHODOLOGY INSERT D: LAYOUT TECHNIQUES FOR
COMPLEX GATES

5

Table Of Contents

CHAPTER 7: DESIGNING SEQUENTIAL LOGIC CIRCUITS
7
...
1
...
1
...
2

Static Latches and Registers
7
...
1
7
...
2
7
...
3
7
...
4
7
...
5

7
...
5
...
5
...
6

Dynamic Transmission-Gate Edge-triggered Registers
C2MOS—A Clock-Skew Insensitive Approach
True Single-Phase Clocked Register (TSPCR)

Alternative Register Styles*
7
...
1
7
...
2

7
...
3
...
3
...
3
...
4

Timing Metrics for Sequential Circuits
Classification of Memory Elements

Latch- vs
...
6
...
6
...
6
...
7

Perspective: Choosing a Clocking Strategy

7
...
9

To Probe Further

DIGITAL INTEGRATED CIRCUITS

6

PART III
...
1

Introduction

8
...
3

Custom Circuit Design

8
...
4
...
4
...
4
...
4
...
5

Standard Cell
Compiled Cells
Macrocells, Megacells and Intellectual Property
Semi-Custom Design Flow

Array-Based Implementation Approaches
8
...
1
8
...
2

Pre-diffused (or Mask-Programmable) Arrays
Pre-wired Arrays

8
...
7

Summary

8
...
1

Introduction

9
...
2
...
2
...
3

Resistive Parasitics
9
...
1
9
...
2
9
...
3

9
...
4
...
4
...
5

Table Of Contents

Inductance and Reliability— Voltage Drop
Inductance and Performance—Transmission Line Effects

Advanced Interconnect Techniques
9
...
1
9
...
2

Reduced-Swing Circuits
Current-Mode Transmission Techniques

9
...
7

Chapter Summary

9
...
1

Introduction

10
...
2
...
2
...
2
...
2
...
3

Synchronous Design — An In-depth Perspective
10
...
1
10
...
2
10
...
3
10
...
4

10
...
6
...
6
...
7

Self-Timed Logic - An Asynchronous Technique
Completion-Signal Generation
Self-Timed Signaling
Practical Examples of Self-Timed Logic

Synchronizers and Arbiters*
10
...
1
10
...
2

10
...
4
...
4
...
4
...
4
...
5

Mesochronous interconnect
Plesiochronous Interconnect
Asynchronous Interconnect9

Basic Concept
Building Blocks of a PLL

Future Directions and Perspectives
10
...
1

Distributed Clocking Using DLLs

DIGITAL INTEGRATED CIRCUITS

10
...
2
10
...
3

Optical Clock Distribution
Synchronous versus Asynchronous Design

10
...
9

To Probe Further

DESIGN METHODOLOGY INSERT G: DESIGN VERIFICATION
CHAPTER 11: DESIGNING ARITHMETIC BUILDING BLOCKS
11
...
2

Datapaths in Digital Processor Architectures

11
...
3
...
3
...
3
...
4

The Multiplier
11
...
1
11
...
2
11
...
3
11
...
4
11
...
5

11
...
5
...
5
...
6

Other Arithmetic Operators

11
...
7
...
7
...
7
...
8

Perspective: Design as a Trade-off

11
...
10 To Probe Further

8

9

Table Of Contents

CHAPTER 12: DESIGNING MEMORY AND ARRAY STRUCTURES
12
...
1
...
1
...
2

The Memory Core
12
...
1
12
...
2
12
...
3
12
...
4

12
...
5
...
5
...
5
...
5
...
5
...
6

The Address Decoders
Sense Amplifiers
Voltage References
Drivers/Buffers
Timing and Control

Memory Reliability and Yield
12
...
1
12
...
2

12
...
3
...
3
...
3
...
3
...
3
...
4

Memory Classification
Memory Architectures and Building Blocks

Sources of Power Dissipation in Memories
Partitioning of the memory
Addressing the Active Power Dissipation
Data-retention dissipation
Summary

Case Studies in Memory Design
12
...
1
12
...
2
12
...
3

The Programmable Logic Array (PLA)
A 4 Mbit SRAM
A 1 Gbit NAND Flash Memory

12
...
8

Summary

12
...
1

Introduction

H
...
3

Design for Testability
H
...
1
H
...
2
H
...
3
H
...
4
H
...
5

H
...
4
...
4
...
4
...
5

Fault Simulation

To Probe Further

INDEX

10

chapter1
...
1

A Historical Perspective

1
...
3

Quality Metrics of a Digital Design

1
...
5

To Probe Further

9

chapter1
...
1

INTRODUCTION

Chapter 1

A Historical Perspective
The concept of digital data manipulation has made a dramatic impact on our society
...
Evolving steadily from mainframe and minicomputers, personal and laptop computers have proliferated into daily life
...
Instrumentation was one of the first noncomputing domains where the
potential benefits of digital data manipulation over analog processing were recognized
...
Only recently have we witnessed the conversion of telecommunications and consumer electronics towards the digital format
...
The compact disk has revolutionized the audio world, and digital video
is following in its footsteps
...
In the early nineteenth century, Babbage envisioned largescale mechanical computing devices, called Difference Engines [Swade93]
...
The Analytical
Engine, developed in 1834, was perceived as a general-purpose computing machine, with
features strikingly close to modern computers
...
It even used pipelining to speed up the execution of the addition operation! Unfortunately, the complexity and the cost of the designs made the concept impractical
...
1)
required 25,000 mechanical parts at a total cost of £17,470 (in 1834!)
...
1 Working part of Babbage’s
Difference Engine I (1832), the first known
automatic calculator (from [Swade93],
courtesy of the Science Museum of
London)
...
fm Page 11 Friday, January 18, 2002 8:58 AM

Section 1
...
Early digital electronics
systems were based on magnetically controlled switches (or relays)
...
Examples of such are train
safety systems, where they are still being used at present
...
While originally
used almost exclusively for analog processing, it was realized early on that the vacuum
tube was useful for digital computations as well
...

The era of the vacuum tube based computer culminated in the design of machines such as
the ENIAC (intended for computing artillery firing tables) and the UNIVAC I (the first
successful commercial computer)
...
5 feet high and several feet wide and incorporated 18,000 vacuum
tubes
...

Reliability problems and excessive power consumption made the implementation of larger
engines economically and practically infeasible
...
It took till 1956 before this led to the first bipolar digital logic gate,
introduced by Harris [Harris56], and even more time before this translated into a set of
integrated-circuit commercial logic gates, called the Fairchild Micrologic family
[Norman60]
...
Other logic families were devised with higher performance in mind
...
TTL had the advantage, however, of offering a higher integration density and
was the basis of the first integrated circuit revolution
...
The family was so successful that it composed the
largest fraction of the digital semiconductor market until the 1980s
...
Although attempts were made to develop high
integration density, low-power bipolar families (such as I2L—Integrated Injection Logic
[Hart72]), the torch was gradually passed to the MOS digital integrated circuit approach
...
Lilienfeld (Canada) as early as 1925, and, independently, by O
...
Insufficient knowledge of the materials and gate stability problems, however, delayed the practical usability of the device for a long time
...

Remarkably, the first MOS logic gates introduced were of the CMOS variety
[Wanlass63], and this trend continued till the late 1960s
...
Instead,
1

An intriguing overview of the evolution of digital integrated circuits can be found in [Murphy93]
...
It is accompanied by some of the historically ground-breaking publications in the domain of digital IC’s
...
fm Page 12 Friday, January 18, 2002 8:58 AM

12

INTRODUCTION

Chapter 1

the first practical MOS integrated circuits were implemented in PMOS-only logic and
were used in applications such as calculators
...
These processors were
implemented in NMOS-only logic, which has the advantage of higher speed over the
PMOS logic
...
For instance, the first 4Kbit MOS memory was introduced in 1970 [Hoff70]
...
The road to the current levels of integration has not been without hindrances, however
...
This realization,
combined with progress in manufacturing technology, finally tilted the balance towards
the CMOS technology, and this is where we still are today
...

Although the large majority of the current integrated circuits are implemented in the
MOS technology, other technologies come into play when very high performance is at
stake
...
BiCMOS is used in high-speed memories and gate arrays
...
These technologies only play a very small role in the overall digital integrated circuit design scene
...
Hence the focus of this textbook on
CMOS only
...
2

Issues in Digital Integrated Circuit Design
Integration density and performance of integrated circuits have gone through an astounding revolution in the last couple of decades
...
This prediction,
later called Moore’s law, has proven to be amazingly visionary [Moore65]
...
Figure 1
...
As can be observed, integration complexity doubles approximately every 1 to 2 years
...

An intriguing case study is offered by the microprocessor
...
The transistor counts for a number of landmark designs are collected
in Figure 1
...
The million-transistor/chip barrier was crossed in the late eighties
...
This is illus-

chapter1
...
2

Issues in Digital Integrated Circuit Design

13

64 Gbits

*0
...
15µm

0
...
2µm

0
...
3µm

0
...
4µm
0
...
6µm

0
...
8µm
1
...
2µm

1
...
4µm

64 Kbits

Encyclopedia
Encyclopedia
2 hrs CD Audio
2 hrs CD Audio
30 sec HDTV
30 sec HDTV

Page
Page
1970

1980

1990

2000

2010

Year

(b) Trends in memory complexity

(a) Trends in logic IC complexity
Figure 1
...


trated in Figure 1
...
An important observation is that, as of now, these trends
have not shown any signs of a slow-down
...
Early designs were truly hand-crafted
...
This
is adequately illustrated in Figure 1
...
This approach is, obviously, not appropriate when more than a million devices
have to be created and assembled
...

100000000

Pentium 4
Pentium III
Pentium II

Transistors

10000000

Pentium ®
1000000

486
386

100000

286 ™
8086

10000

4004
1000
1970

8080
8008
1975

1980

1985

1990

1995

2000

Year of Introduction

Figure 1
...


chapter1
...
1
1970

286

386

8080
8008
4004

1980

1990

Year

2000

2010

Figure 1
...


Designers have, therefore, increasingly adhered to rigid design methodologies and strategies that are more amenable to design automation
...
5b
...
Cells are reused as much as possible to reduce the design effort
and to enhance the chances for a first-time-right implementation
...

The obvious next question is why such an approach is feasible in the digital world
and not (or to a lesser degree) in analog designs
...
At each design level,
the internal details of a complex module can be abstracted away and replaced by a black
box view or model
...
For instance, once a designer has implemented a
multiplier module, its performance can be defined very accurately and can be captured in a
model
...
For all purposes, it can hence be considered a black
box with known characteristics
...
The impact of
this divide and conquer approach is dramatic
...

This is analogous to a software designer using a library of software routines such as
input/output drivers
...
The only thing he cares about is the intended result of calling one of those
modules
...


chapter1
...
2

Issues in Digital Integrated Circuit Design

15

(a) The 4004 microprocessor

Standard Cell Module

Memory Module

(b) The Pentium ® 4 microprocessor
Figure 1
...


chapter1
...
g
...
g
...
6
...
6

+

n

+

Design abstraction levels in digital circuits
...
No circuit designer will ever seriously consider the solid-state
physics equations governing the behavior of the device when designing a digital gate
...
For instance, an AND gate is adequately described by its Boolean expression (Z = A
...

This design philosophy has been the enabler for the emergence of elaborate computer-aided design (CAD) frameworks for digital integrated circuits; without it the current
design complexity would not have been achievable
...
An
overview of these tools and design methodologies is given in Chapter 8 of this textbook
...
These libraries contain not only the layouts, but also provide complete documentation and characterization of the behavior of the cells
...
fm Page 17 Friday, January 18, 2002 8:58 AM

Section 1
...
5b)
...
In this approach, logic gates are placed in rows of cells of
equal height and interconnected using routing channels
...

The preceding analysis demonstrates that design automation and modular design
practices have effectively addressed some of the complexity issues incurred in contemporary digital design
...
If design automation
solves all our design problems, why should we be concerned with digital circuit design at
all? Will the next-generation digital designer ever have to worry about transistors or parasitics, or is the smallest design entity he will ever consider the gate and the module?
The truth is that the reality is more complex, and various reasons exist as to why an
insight into digital circuits and their intricacies will still be an important asset for a long
time to come
...
Semiconductor technologies continue to advance from year to year
...

• Creating an adequate model of a cell or module requires an in-depth understanding
of its internal operation
...

• The library-based approach works fine when the design constraints (speed, cost or
power) are not stringent
...

Unfortunately for a large number of other products such as microprocessors, success
hinges on high performance, and designers therefore tend to push technology to its
limits
...
To resort to our previous analogy to software methodologies, a programmer tends to “customize” software routines when execution speed is crucial; compilers—or design tools—are not yet to the level of what human sweat or ingenuity
can deliver
...
The performance of, for instance, an adder can be substantially influenced by the way it is connected to its environment
...
The impact of the interconnect parasitics is bound
to increase in the years to come with the scaling of the technology
...

Some design entities tend to be global or external (to resort anew to the software
analogy)
...
Increasing the size of a digital design has a

chapter1
...
For instance, connecting more cells to a supply line can cause a voltage drop over the wire, which, in its turn, can slow down all
the connected cells
...
Coping with them
requires a profound understanding of the intricacies of digital circuit design
...
A typical example of this is the periodical reemergence of
power dissipation as a constraining factor, as was already illustrated in the historical
overview
...
To cope with these unforeseen factors, one must at least be able to model
and analyze their impact, requiring once again a profound insight into circuit topology and behavior
...
A fabricated circuit does not always
exhibit the exact waveforms one might expect from advance simulations
...
Troubleshooting a design
requires circuit expertise
...
Even
though she might not have to deal with the details of the circuit on a daily basis, the understanding will help her to cope with unexpected circumstances and to determine the dominant effects when analyzing a design
...
1 Clocks Defy Hierarchy
To illustrate some of the issues raised above, let us examine the impact of deficiencies in one
of the most important global signals in a design, the clock
...
This task can be
compared to the function of a traffic light that determines which cars are allowed to move
...
Under ideal circumstances, the clock signal is a periodic step waveform with transitions synchronized
throughout the designed circuit (Figure 1
...
In light of our analogy, changes in the traffic
lights should be synchronized to maximize throughput while avoiding accidents
...
7b)
...
This is confirmed by the simulations shown in Figure
1
...

Due to delays associated with routing the clock wires, it may happen that the clocks
become misaligned with respect to each other
...
Consider the case that the clock signal for the second
register is delayed—or skewed—by a value δ
...
If the time it takes to propagate the
output of the first register to the input of the second is smaller than the clock delay, the latter
will sample the wrong value
...
fm Page 19 Friday, January 18, 2002 8:58 AM

Section 1
...
7

time
(c) Simulated waveforms

Impact of clock misalignment
...
In terms of our traffic analogy, cars of a first traffic light hit the cars of the
next light that have not left yet
...
Clock
skew is actually one of the most critical design problems facing the designers of large, highperformance systems
...
2 Power Distribution Networks Defy Hierarchy

While the clock signal is one example of a global signal that crosses the chip hierarchy
boundaries, the power distribution network represents another
...
To ensure proper operation, this
voltage should be stable within a few hundred millivolts
...
The
resistive nature of the on-chip wires and the inductance of the IC package pins make this a
difficult proposition
...
This leads to a current variation of 100 GA/sec, which is a truly
astounding number
...
A current of 1 A
running through a wire with a resistance of 1 Ω causes a voltage drop of 1V
...
2 and 2
...
fm Page 20 Friday, January 18, 2002 8:58 AM

20

INTRODUCTION

Block A

Block B

(a) Routing through the block

Block A

Chapter 1

Block B

(b) Routing around the block

Figure 1
...


able
...
While
this sizing of the power network is relatively simple in a flat design approach, it is a lot
more complex in a hierarchical design
...
8a [Saleh01]
...
If power is routed
through Block A to Block B, a larger IR drop will occur in Block B since power is also
being consumed by Block A before it reaches Block B
...
8b
...
Although routing power this way is easier to control and maintain, it also requires more area to implement
...
This requirement forces designers to set
aside area for power busing that takes away from the available routing area
...
For instance, it is not always easy to determine which
way the current will flow when multiple parallel paths are available between the power
source and the consuming gate
...
All these considerations make the design of the power-distribution a challenging job
...

The purpose of this textbook is to provide a bridge between the abstract vision of
digital design and the underlying digital circuit and its peculiarities
...
The persistent quest for a designer when designing each of the mentioned modules is to identify the dominant design parameters, to locate the section of the
design he should focus his optimizations on, and to determine the specific properties that
make the module under investigation (e
...
, a memory) different from any others
...
fm Page 21 Friday, January 18, 2002 8:58 AM

Section 1
...


1
...
These properties help to
quantify the quality of a design from different perspectives: cost, functionality, robustness,
performance, and energy consumption
...
For instance, pure speed is a crucial property in a compute
server
...
The introduced properties are relevant at all levels of the
design hierarchy, be it system, chip, module, and gate
...

1
...
1

Cost of an Integrated Circuit

The total cost of any product can be separated into two components: the recurring
expenses or the variable cost, and the non-recurring expenses or the fixed cost
...
An important component of the fixed cost of an integrated circuit is the effort in time and manpower it takes to produce the design
...
Advanced design methodologies that automate major parts of the design process
can help to boost the latter
...

Additionally, one has to account for the indirect costs, the company overhead that
cannot be billed directly to one product
...

Variable Cost
This accounts for the cost that is directly attributable to a manufactured product, and is
hence proportional to the product volume
...
The total cost of an integrated circuit is now
fixed cost
cost per IC = variable cost per IC +  ---------------------- 
 volume 

(1
...
fm Page 22 Friday, January 18, 2002 8:58 AM

22

INTRODUCTION

Chapter 1

Individual die

Figure 1
...
Each
square represents a die - in this case
the AMD Duron™ microprocessor
(Reprinted with permission from AMD)
...
This also
explains why it makes sense to have large design team working for a number of years on a
hugely successful product such as a microprocessor
...
2)

As will be elaborated on in Chapter 2, the IC manufacturing process groups a number of
identical circuits onto a single wafer (Figure 1
...
Upon completion of the fabrication, the
wafer is chopped into dies, which are then individually packaged after being tested
...
The cost of packaging and test is the
topic of later chapters
...
The latter factor is called the die yield
...
3)

The number of dies per wafer is, in essence, the area of the wafer divided by the die
area
...
Dies around the perimeter of the wafer are therefore lost
...
Eq
...
3)
also presents the first indication that the cost of a circuit is dependent upon the chip
area—increasing the chip area simply means that less dies fit on a wafer
...
Both the substrate material and the manufacturing process introduce faults that
can cause a chip to fail
...
fm Page 23 Friday, January 18, 2002 8:58 AM

Section 1
...
4)

α is a parameter that depends upon the complexity of the manufacturing process, and is
roughly proportional to the number of masks
...
The defects per unit area is a measure of the material and process
induced faults
...
5 and 1 defects/cm2 is typical these days, but depends
strongly upon the maturity of the process
...
3 Die Yield
Assume a wafer size of 12 inch, a die size of 2
...
Determine the
die yield of this CMOS process run
...

2

dies per wafer = π × ( wafer diameter ⁄ 2 ) – π × wafer diameter
----------------------------------------------------------- --------------------------------------------die area
2 × die area
This means 252 (= 296 - 44) potentially operational dies for this particular example
...
(1
...


The bottom line is that the number of functional of dies per wafer, and hence the
cost per die is a strong function of the die area
...
Bearing in mind the
equations derived above and the typical parameter values, we can conclude that die costs
are proportional to the fourth power of the area:
cost of die = f ( die area )

4

(1
...
Small area is hence a desirable property for a digital gate
...
Smaller gates furthermore tend to be faster and consume less energy, as the total gate capacitance—which is
one of the dominant performance parameters—often scales with the area
...
Other parameters may have an impact, though
...
The gate complexity, as expressed by the number of transistors and the regularity of the interconnect structure, also has an impact on the design cost
...
Simplicity and regularity is a precious property in cost-sensitive designs
...
3
...
The measured behavior of a manufactured circuit normally deviates from the

chapter1
...
One reason for this aberration are the variations in the manufacturing
process
...
The electrical behavior of a circuit can be
profoundly affected by those variations
...
The word noise in the context
of digital circuits means “unwanted variations of voltages and currents at the logic
nodes
...
Some examples of digital noise
sources are depicted in Figure 1
...
For instance, two wires placed side by side in an integrated circuit form a coupling capacitor and a mutual inductance
...
Noise
on the power and ground rails of a gate also influences the signal levels in the gate
...
Capacitive and inductive cross talk, and the internally-generated
power supply noise are examples of such
...
For these
sources, the noise level is directly expressed in Volt or Ampere
...
Noise is a major concern in the engineering of digital circuits
...

VDD

v(t)
i(t)

(a) Inductive coupling

(b) Capacitive coupling

Figure 1
...


(c) Power and ground
noise

The steady-state parameters (also called the static behavior) of a gate measure how
robust the circuit is with respect to both variations in the manufacturing process and noise
disturbances
...

Digital circuits (DC) perform operations on logical (or Boolean) variables
...
e
...
6)

chapter1
...
3

Quality Metrics of a Digital Design

25

A logical variable is, however, a mathematical abstraction
...
This is most often a node
voltage that is not discrete but can adopt a continuous range of values
...
Applying VOH to the input of an inverter yields VOL at the output and vice
versa
...

V OH = ( V OL )
V OL = ( V OH )

(1
...
The electrical function of a gate is best expressed by its voltage-transfer
characteristic (VTC) (sometimes called the DC transfer characteristic), which plots the
output voltage as a function of the input voltage Vout = f(Vin)
...
11
...
Another point of interest of the
VTC is the gate or switching threshold voltage VM (not to be confused with the threshold
voltage of a transistor), that is defined as VM = f(VM)
...
The gate threshold voltage presents the midpoint of the switching characteristics, which is obtained when the output of a gate is short-circuited to the input
...

Vout

VOH

f
Vout = Vin

V

M

VOL

VOL

VOH

Vin

Figure 1
...


Even if an ideal nominal value is applied at the input of a gate, the output signal
often deviates from the expected nominal value
...
e
...
Figure 1
...
fm Page 26 Friday, January 18, 2002 8:58 AM

26

INTRODUCTION

Chapter 1

alone
...
These represent by definition the points where the gain
(= dVout / dVin) of the VTC equals −1 as shown in Figure 1
...
The region between VIH
and VIL is called the undefined region (sometimes also referred to as transition width, or
TW)
...

Noise Margins
For a gate to be robust and insensitive to noise disturbances, it is essential that the “0” and
“1” intervals be as large as possible
...
8)

NM H = V OH – V IH

The noise margins represent the levels of noise that can be sustained when gates are cascaded as illustrated in Figure 1
...
It is obvious that the margins should be larger than 0
for a digital circuit to be functional and by preference should be as large as possible
...
Assume that a signal is
disturbed by noise and differs from the nominal voltage levels
...
This deviation is added to the noise injected at
the output node and passed to the next gate
...
This, fortunately,
does not happen if the gate possesses the regenerative property, which ensures that a dis-





“1”

VOH
VIH

Vout
VOH

Slope = -1

Undefined
Region





“0”

VIL
VOL

(a) Relationship between voltage and logic levels
Figure 1
...


Slope = -1
VOL
VIL

VIH

(b) Definition of VIH and VIL

Vin

chapter1
...
3

Quality Metrics of a Digital Design

27

“1”
VOH
NMH

VIH
Undefined
region
VIL

NML

VOL
“0”
Gate output

Gate input

Stage M

Stage M + 1

Figure 1
...


turbed signal gradually converges back to one of the nominal voltage levels after passing
through a number of logical stages
...
14a)
...
Similarly,
when an input voltage vin (vin ∈ “1”) is applied to the inverter chain, the output voltage
will approach the nominal value VOH
...
14

The regenerative property
...
4 Regenerative property
The concept of regeneration is illustrated in Figure 1
...
The input signal to the chain is a step-waveform with

chapter1
...
Instead of swinging from rail to rail,
v0 only extends between 2
...
9 V
...
6 V to 4
...
Even further, v2 already swings between the nominal VOL and VOH
...


The conditions under which a gate is regenerative can be intuitively derived by analyzing a simple case study
...
15(a) plots the VTC of an inverter Vout = f(Vin) as well
as its inverse function finv(), which reverts the function of the x- and y-axis and is defined
as follows:
in = f ( out ) ⇒ in = finv ( out )
out

(1
...
15 Conditions for regeneration
...
The output voltage of this inverter equals v1 = f(v0) and is applied to
the next inverter
...
The signal voltage gradually converges to the nominal signal after a number of inverter stages, as indicated by the
arrows
...
15(b) the signal does not converge to any of the nominal voltage levels
but to an intermediate voltage level
...
The difference between the two cases is due to the gain characteristics of the gates
...
Such a gate has two stable operating points
...

Noise Immunity
While the noise margin is a meaningful means for measuring the robustness of a circuit
against noise, it is not sufficient
...
fm Page 29 Friday, January 18, 2002 8:58 AM

Section 1
...
Noise immunity, on the other hand, expresses the ability of the system to process and transmit information correctly in the presence of noise [Dally98]
...
These circuits have the property that only a small
fraction of a potentially-damaging noise source is coupled to the important circuit nodes
...
Circuits that do not posses this property are susceptible to noise
...
As discussed earlier, the noise sources
can be divided into sources that are
• proportional to the signal swing Vsw
...

• fixed
...

We assume, for the sake of simplicity, that the noise margin equals half the signal swing
(for both H and L)
...

V sw
V NM = ------- ≥
2

∑f V
i

Nfi

+

∑g V
j

sw

(1
...
11)

j

This makes it clear that the signal swing (and the noise margin) has to be large enough to
overpower the impact of the fixed sources (f VNf)
...
In the presence of large gain factors, increasing the
signal swing does not do any good to suppress noise, as the noise increases proportionally
...

Directivity
The directivity property requires a gate to be unidirectional, that is, changes in an output
level should not appear at any unchanging input of the same circuit
...

In real gate implementations, full directivity can never be achieved
...
Capacitive coupling between
inputs and outputs is a typical example of such a feedback
...


chapter1
...
16)
...
From the world of analog amplifiers, we know that this effect is minimized by making
the input resistance of the load gates as large as possible (minimizing the input currents)
and by keeping the output resistance of the driving gate small (reducing the effects of load
currents on the output voltage)
...
For these reasons, many generic and library
components define a maximum fan-out to guarantee that the static and dynamic performance of the element meet specification
...
16b)
...


M
N
(b) Fan-in M

(a) Fan-out N

Figure 1
...


The Ideal Digital Gate
Based on the above observations, we can define the ideal digital gate from a static perspective
...

Its VTC is shown in Figure 1
...
The input and output impedances of the
ideal gate are infinity and zero, respectively (i
...
, the gate has unlimited fan-out)
...

Example 1
...
18 shows an example of a voltage-transfer characteristic of an actual, but outdated
gate structure (as produced by SPICE in the DC analysis mode)
...


chapter1
...
3

Quality Metrics of a Digital Design

31

Vout

g = -∞

Figure 1
...
5 V;
VIH = 2
...


VOL = 0
...
66 V

VM = 1
...
15 V; NML = 0
...
05 V is substantially below the maximum obtainable value of 5 V (which is the value of the supply voltage for this design)
...
0

Vout (V)

4
...
0
2
...
0

0
...
0

2
...
0
Vin (V)

1
...
3

4
...
0

Figure 1
...


Performance

From a system designers perspective, the performance of a digital circuit expresses the
computational load that the circuit can manage
...
This performance

chapter1
...
While the
former is crucially important, it is not the focus of this text book
...
When focusing on the pure
design, performance is most often expressed by the duration of the clock period (clock
cycle time), or its rate (clock frequency)
...
Each of these topics will be discussed in detail on the course of this text book
...

The propagation delay tp of a gate defines how quickly it responds to a change at its
input(s)
...
It is
measured between the 50% transition points of the input and output waveforms, as shown
in Figure 1
...
2 Because a gate displays different response times for
rising or falling input waveforms, two definitions of the propagation delay are necessary
...
The propagation delay tp is
defined as the average of the two
...
12)

Vin

50%
t
Vout

tpHL

tpLH
90%
50%
10%
tf

t
tr

Figure 1
...


2
The 50% definition is inspired the assumption that the switching threshold VM is typically located in the
middle of the logic swing
...
fm Page 33 Friday, January 18, 2002 8:58 AM

Section 1
...
It is mostly used to compare different semiconductor technologies, or logic design styles
...
Most importantly, the delay is a function of the
slopes of the input and output signals of the gate
...
19), and express how fast a signal transits between
the different levels
...
The rise/fall time of a signal is largely determined by the
strength of the driving gate, and the load presented by the node itself, which sums the contributions of the connecting gates (fan-out) and the wiring parasitics
...
A uniform way of measuring the tp of a gate, so that technologies can be judged on an equal footing, is desirable
...
20)
...
The period T of the oscillation is
determined by the propagation time of a signal transition through the complete chain, or
T = 2 × tp × N with N the number of inverters in the chain
...
Note
that this equation is only valid for 2Ntp >> tf + tr
...
Typically, a ring oscillator needs a
least five stages to be operational
...
20

v2

v1

v3

v4

v5

Ring oscillator circuit for propagation-delay measurement
...
fm Page 34 Friday, January 18, 2002 8:58 AM

34

INTRODUCTION

Chapter 1

CAUTION: We must be extremely careful with results obtained from ring oscillator
measurements
...
The oscillator results are primarily useful for quantifying the differences between various manufacturing technologies and gate topologies
...
In more realistic digital circuits, fan-ins and fan-outs are higher, and
interconnect delays are non-negligible
...
As a result, the achievable clock frequency on
average is 50 to a 100 times slower than the frequency predicted from ring oscillator measurements
...


Example 1
...
21
...

R

vin

vout
C
Figure 1
...


When applying a step input (with vin going from 0 to V), the transient response of this
circuit is known to be an exponential function, and is given by the following expression
(where τ = RC, the time constant of the network):

v out(t) = (1 − e−t/τ) V

(1
...
69τ
...
2τ to get to the 90% point
...


1
...
4

Power and Energy Consumption

The power consumption of a design determines how much energy is consumed per operation, and much heat the circuit dissipates
...
Therefore, power dissipation is an important
property of a design that affects feasibility, cost, and reliability
...
With the increasing popularity of mobile and distributed computation, energy limitations put a firm restriction on the number of computations that can be performed given a minimum time between battery recharges
...
fm Page 35 Friday, January 18, 2002 8:58 AM

Section 1
...
For instance, the peak power Ppeak is important when studying supply-line
sizing
...
Both measures are defined in equation Eq
...
14):
P peak = i peak V supply = max [ p ( t ) ]
T

T





0

0

V supply
1
P av = -- p ( t )dt = --------------- i supply ( t )dt
T
T

(1
...

The dissipation can further be decomposed into static and dynamic components
...
It is attributed to the
charging of capacitors and temporary current paths between the supply rails, and is, therefore, proportional to the switching frequency: the higher the number of switching events,
the higher the dynamic power consumption
...
It is always present, even when the circuit is in
stand-by
...

The propagation delay and the power consumption of a gate are related—the propagation delay is mostly determined by the speed at which a given amount of energy can be
stored on the gate capacitors
...
For a given technology and gate topology, the product of
power consumption and propagation delay is generally a constant
...
The PDP is simply the energy consumed by the gate per switching
event
...

An ideal gate is one that is fast, and consumes little energy
...
From the above, it should be clear that the E-D is equivalent
to power-delay2
...
7 Energy Dissipation of First-Order RC Network
Let us consider again the first-order RC network shown in Figure 1
...
When applying a step
input (with Vin going from 0 to V), an amount of energy is provided by the signal source to the
network
...
15)

It is interesting to observe that the energy needed to charge a capacitor from 0 to V volt
with a step input is a function of the size of the voltage step and the capacitance, but is inde-

chapter1
...
We can also compute how much of the delivered energy
gets stored on the capacitor at the end of the transition
...
16)

0

This is exactly half of the energy delivered by the source
...
We leave it to the reader to demonstrate that during the discharge phase (for a step from V to 0), the energy originally stored on the capacitor
gets dissipated in the resistor as well, and turned into heat
...
4

Summary
In this introductory chapter, we learned about the history and the trends in digital circuit
design
...
At the
end of the Chapter, you can find an extensive list of reference works that may help you to
learn more about some of the topics introduced in the course of the text
...
5

To Probe Further
The design of digital integrated circuits has been the topic of a multitude of textbooks and
monographs
...
The state-of-the-art developments in the area
of digital design are generally reported in technical journals or conference proceedings,
the most important of which are listed
...
fm Page 37 Friday, January 18, 2002 8:58 AM

Section 1
...
Annaratone, Digital CMOS Circuit Design, Kluwer, 1986
...
Dillinger, VLSI Engineering, Prentice Hall, 1988
...
Elmasry, ed
...

E
...
, Digital MOS Integrated Circuits II, IEEE Press, 1992
...
Glasser and D
...

A
...
, McGraw-Hill, 1999
...
Mead and L
...

K
...

D
...
Eshraghian, Basic VLSI Design, Prentice Hall, 1988
...
Shoji, CMOS Digital Circuit Technology, Prentice Hall, 1988
...
Uyemura, Circuit Design for CMOS VLSI, Kluwer, 1992
...
Veendrick, MOS IC’s: From Basics to ASICS, VCH, 1992
...

High-Performance Design
K
...

A
...
Fox, and W
...
, Design of High-Performance Microprocessor Circuits, IEEE Press, 2000
...
Shoji, High-Speed Digital Circuits, Addison-Wesley, 1996
...
Chandrakasan and R
...
, Low-Power Digital CMOS Design, IEEE Press, 1998
...
Rabaey and M
...
, Low-Power Design Methodologies, Kluwer Academic, 1996
...
Yeap, Practical Low-Power CMOS Design, Kluwer Academic, 1998
...
Itoh, VLSI Memory Chip Design, Springer, 2001
...
Prince, Semiconductor Memories, Wiley, 1991
...
Prince, High Performance Memories, Wiley, 1996
...
Hodges, Semiconductor Memories, IEEE Press, 1972
...
Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley, 1990
...
Dally and J
...

E
...
, Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, 1995
...
Lau et al, ed
...

Design Tools and Methodologies
V
...
Seth, Test Generation for VLSI Chips, IEEE Press, 1988
...
fm Page 38 Friday, January 18, 2002 8:58 AM

38

INTRODUCTION

Chapter 1

D
...

G
...

S
...

J
...

A
...

W
...

Bipolar and BiCMOS
A
...

M
...
, BiCMOS Integrated Circuit Design, IEEE Press, 1994
...
Embabi, A
...
Elmasry, Digital BiCMOS Integrated Circuit Design, Kluwer,
1993
...
, eds
...

General
J
...

H
...

D
...
Jackson, Analysis and Design of Digital Integrated Circuits, 2nd ed
...

M
...

R
...
Watts, Submicron Integrated Circuits, Wiley, 1989
...
Bardeen and W
...
Rev
...
74, p
...

[Beeson62] R
...
Ruegg, “New Forms of All Transistor Logic,” ISSCC Digest of Technical Papers, pp
...
1962
...
Dally, Digital Systems Engineering, Cambridge University Press, 1998
...
Faggin, M
...
Hoff, Jr, H
...
Mazor, M
...
1-6
...
Harris, “Direct-Coupled Transistor Logic Circuitry in Digital Computers,” ISSCC
Digest of Technical Papers, p
...
1956
...
Hart and M
...
92–93, Feb
...

[Hoff70] E
...
68–73, August 3, 1970
...
intel
...
htm
[Masaki74] A
...
Harada and T
...
62–63, Feb
...

[Moore65] G
...
38,
Nr 8, April 1965
...
fm Page 39 Friday, January 18, 2002 8:58 AM

Section 1
...
Murphy, “Perspectives on Logic and Microprocessors,” Commemorative Supplement to the Digest of Technical Papers, ISSCC Conf
...
49–51, San Francisco, 1993
...
Norman, J
...
Haas, “Solid-State Micrologic Elements,” ISSCC Digest of
Technical Papers, pp
...
1960
...
Hennessy and D
...
Saleh, M
...
simplex
...
php?page_name=wp_powerplan
[Schockley49] W
...
28, p
...

[Shima74] M
...
Faggin and S
...
56–57, Feb
...

[Swade93] D
...
86–91, February 1993
...
Wanlass, and C
...
32–32, Feb
...


1
...


2
...


4
...


[E, None, 1
...
Determine also
how much DRAM should be available on a single chip at that point in time, if Moore’s law
would still hold
...
2]
Visit
the
Intel
on-line
microprocessor
museum
(http://www
...
com/intel/intelis/museum/exhibit/hist_micro/index
...
While browsing
through the microprocessor hall-of-fame, determine the rate of increase in transistor counts
and clock frequencies in the 70’s, 80’s, and 90’s
...
Spend some time browsing the site
...

[D, None, 1
...
Determine for
each of those, the number of integrated devices, the overall area and the maximum clock
speed
...
2
...
2] Find in the library the latest November issue of the Journal of Solid State Circuits
...
),
the minimum feature size, the number of devices on a single die, and the maximum clock
speed
...

[E, None, 1
...
6
...
fm Page 40 Friday, January 18, 2002 8:58 AM

40

INTRODUCTION

Chapter 1

chapter2
...
1

Introduction

2
...
5
...
2
...
5
...
2
...
2
...
2
...
3

2
...
4

2
...
4
...
4
...
4
...
fm Page 42 Friday, January 18, 2002 8:59 AM

42

2
...
Yet, some insight in the steps
that lead to an operational silicon chip comes in quite handy in understanding the physical
constraints that are imposed on a designer of an integrated circuit, as well as the impact of
the fabrication process on issues such as cost
...
It is not our aim to present a detailed description of
the fabrication technology, which easily deserves a complete course [Plummer00]
...
We learn that a set of optical masks forms the central interface between the
intrinsics of the manufacturing process and the design that the user wants to see transferred to the silicon fabric
...
As such, these patterns have to adhere to some constraints
in terms of minimum width and separation if the resulting circuit is to be fully functional
...
If the designer adheres to these rules, he gets
a guarantee that his circuit will be manufacturable
...
Finally, an overview is
given of the IC packaging options
...


2
...
1
...
To accommodate both types of devices, special regions called
wells must be created in which the semiconductor material is opposite to the type of the
channel
...
The cross section

Figure 2
...


chapter2
...
2

Manufacturing CMOS Integrated Circuits

43

shown in Figure 2
...

Modern processes are increasingly using a dual-well approach that uses both n- and pwells, grown on top on a epitaxial layer, as shown in Figure 2
...
We will restrict the
remainder of this discussion to the latter process (without loss of generality)
...
2Cross section of modern dual-well CMOS process
...
A number of these steps and/or operations are executed very
repetitively in the course of the manufacturing process
...

2
...
1

The Silicon Wafer

The base material for the manufacturing process comes
in the form of a single-crystalline, lightly doped wafer
...
3)
...
Often, the surface of the
wafer is doped more heavily, and a single crystal epitaxial layer of the opposite type is grown over the surface before the wafers are handed to the processing
company
...
High defect densities lead to a larger
fraction of non-functional circuits, and consequently an
increase in cost of the final product
...
3 Single-crystal ingot and
sliced wafers (from [Fullman99])
...
fm Page 44 Friday, January 18, 2002 8:59 AM

44

THE MANUFACTURING PROCESS

2
...
2

Chapter 2

Photolithography

In each processing step, a certain area on the chip is masked out using the appropriate optical mask so that a desired processing step can be selectively applied to the remaining
regions
...
The technique to accomplish
this selective masking, called photolithography, is applied throughout the manufacturing
process
...
4 gives a graphical overview of the different operations involved in a
typical photolitographic process
...
4 Typical operations in a single
photolithographic cycle (from [Fullman99])
...
Oxidation layering — this optional step deposits a thin layer of SiO2 over the complete wafer by exposing it to a mixture of high-purity oxygen and hydrogen at
approximately 1000°C
...

2
...
This material is
originally soluble in an organic solvent, but has the property that the polymers cross-

chapter2
...
2

Manufacturing CMOS Integrated Circuits

45

link when exposed to light, making the affected regions insoluble
...
A positive photoresist has the opposite properties; originally insoluble, but soluble after exposure
...
Since the cost of a mask is increasing quite rapidly
with the scaling of technology, a reduction of the number of masks is surely of high
priority
...
Stepper exposure — a glass mask (or reticle), containing the patterns that we want to
transfer to the silicon, is brought in close proximity to the wafer
...
The glass mask can be thought of as the negative of one
layer of the microcircuit
...
Where the mask is transparent, the photoresist becomes insoluble
...
Photoresist development and bake — the wafers are developed in either an acid or
base solution to remove the non-exposed areas of photoresist
...

5
...
This is accomplished through the use of many different
types of acid, base and caustic solutions as a function of the material that is to be
removed
...
Because of the dangerous nature of
some of these solvents, safety and environmental impact is a primary concern
...
Spin, rinse, and dry — a special tool (called SRD) cleans the wafer with deionized
water and dries it with nitrogen
...

To prevent this from happening, the processing steps are performed in ultra-clean
rooms where the number of dust particles per cubic foot of air ranges between 1 and
10
...
This
explains why the cost of a state-of-the-art fabrication facility easily ranges in the
multiple billions of dollars
...

7
...
These
are the subjects of the subsequent section
...
Photoresist removal (or ashing) — a high-temperature plasma is used to selectively
remove the remaining photoresist without damaging device layers
...
5
...
Yet, the reader has to bear in mind that same sequence patterns the layer of the complete surface of the wafer
...
fm Page 46 Friday, January 18, 2002 8:59 AM

46

THE MANUFACTURING PROCESS

Chapter 2

Chemical or plasma
etch
Si-substrate

Hardened resist
SiO2

(a) Silicon base material

Si-substrate
Photoresist
SiO2

Si-substrate

(d) After development and etching of resist,
chemical or plasma etch of SiO2
Hardened resist
SiO2

(b) After oxidation and deposition
of negative photoresist
Si-substrate
UV-light
Patterned
optical mask

(e) After etching

Exposed resist

Si-substrate

(c) Stepper exposure

SiO2
Si-substrate

(f) Final result after removal of resist

Figure 2
...


millions of patterns to the semiconductor surface simultaneously
...

The continued scaling of the minimum feature sizes in integrated circuits puts an
enormous burden on the developer of semiconductor manufacturing equipment
...
The dimensions of the features to be
transcribed surpass the wavelengths of the optical light sources, so that achieving the necessary resolution and accuracy becomes harder and harder
...
1 µm) process
generation
...
This adds substantially to the cost of mask making
...
These techniques, while fully functional, are currently less attractive from an economic viewpoint
...
fm Page 47 Friday, January 18, 2002 8:59 AM

Section 2
...
2
...
The creation of the source and drain
regions, well and substrate contacts, the doping of the polysilicon, and the adjustments of
the device threshold are examples of such
...
In both techniques, the area to be doped is
exposed, while the rest of the wafer is coated with a layer of buffer material, typically
SiO2
...
A gas containing the dopant is introduced in the tube
...
The final dopant concentration is the
greatest at the surface and decreases in a gaussian profile deeper in the material
...
The ion
implantation system directs and sweeps a beam of purified ions over the semiconductor
surface
...
The ion implantation
method allows for an independent control of depth and dosage
...

Ion implantation has some unfortunate side effects however, the most important one
being lattice damage
...
This problem is largely
resolved by applying a subsequent annealing step, in which the wafer is heated to around
1000°C for 15 to 30 minutes, and then allowed to cool slowly
...

Deposition
Any CMOS process requires the repetitive deposition of layers of a material over
the complete wafer, to either act as buffers for a processing step, or as insulating or conducting layers
...
Other materials require different techniques
...
This silicon nitride is deposited everywhere using a process called chemical vapor deposition or CVD, which uses a gas-phase
reaction with energy supplied by heat at around 850°C
...
The resulting reaction produces a non-crystalline or amorphous material
called polysilicon
...

The Aluminum interconnect layers are typically deployed using a process known as
sputtering
...
fm Page 48 Friday, January 18, 2002 8:59 AM

48

THE MANUFACTURING PROCESS

Chapter 2

delivered by electron-beam or ion-beam bombarding
...

Etching
Once a material has been deposited, etching is used to selectively form patterns such
as wires and contact holes
...
For instance, hydrofluoric acid buffered with ammonium fluoride is typically used to etch SiO2
...
A wafer is placed
into the etch tool's processing chamber and given a negative electrical charge
...
5 Pa, then filled with a positively charged plasma (usually a mix of nitrogen, chlorine and boron trichloride)
...
Plasma etching has the advantage of offering
a well-defined directionality to the etching action, creating patterns with sharp vertical
contours
...
If no special steps were taken, this would definitely
not be the case in modern CMOS processes, where multiple patterned metal interconnect
layers are superimposed onto each other
...
This process uses a slurry compound—a liquid carrier with a suspended
abrasive component such as aluminum oxide or silica—to microscopically plane a device
layer and to reduce the step heights
...
2
...
6
...
All other areas of the die will be covered with a thick layer of silicon dioxide
(SiO2), called the field oxide
...
1), or deposited in etched
trenches (Figure 2
...
Further insulation is provided
by the addition of a reverse-biased np-diode, formed by adding an extra p+ region, called
the channel-stop implant (or field implant) underneath the field oxide
...
To construct an NMOS transistor in a
p-well, heavily doped n-type source and drain regions are implanted (or diffused) into the
lightly doped p-type substrate
...
The conductive material forms the gate of the transistor
...
fm Page 49 Friday, January 18, 2002 8:59 AM

Section 2
...
6 Simplified process sequence for the manufacturing of a ndual-well CMOS circuit
...
Multiple insulated layers of metallic (most often Aluminum) wires are deposited on
top of these devices to provide for the necessary interconnections between the transistors
...
7
...
The process starts with a p-substrate surfaced with a lightly doped p-epitaxial layer (a)
...
A plasma etching step using the complementary of the
active area mask creates the trenches, used for insulating the devices (c)
...
At that point, the sacrificial nitride is removed (d)
...
This is followed by a second implant step to adjust the threshold voltages of the
PMOS transistors
...
Similar operations (using other dopants) are performed to create the p-wells,
and to adjust the thresholds of the NMOS transistors (f)
...
Polysilicon is
used both as gate electrode material for the transistors as well as an interconnect medium
(g)
...
The same implants are also used to dope
1

Most modern processes also include extra implants for the creation of the lightly-doped drain regions
(LDD), and the creation of gate spacers at this point
...


chapter2
...
fm Page 51 Friday, January 18, 2002 8:59 AM

Section 2
...
These
steps also dope the polysilicon
...


Al
(j) After deposition and
patterning of first Al layer
...


Figure 2
...
Be
aware that the drawings are stylized for understanding, and that the aspects ratios are not proportioned to reality
...
fm Page 52 Friday, January 18, 2002 8:59 AM

52

THE MANUFACTURING PROCESS

Chapter 2

the polysilicon on the surface, reducing its resistivity
...
Note that the polysilicon gate, which is patterned before the doping, actually defines the precise location of the channel region, and hence the location of the source
and drain regions
...
The process continues
with the deposition of the metallic interconnect layers
...
Intermediate planarization steps ensure that the surface remains reasonable flat, even in the
presence of multiple interconnect layers
...
This layer would be CVD SiO2,
although often an additional layer of nitride is deposited as it is more impervious to moisture
...

A cross-section of the final artifact is shown in Figure 2
...
Observe how the transistors occupy only a small fraction of the total height of the structure
...


transistor
Figure 2
...


2
...
The goal of defining a set of design rules is to allow for a ready translation
of a circuit concept into an actual geometry in silicon
...

Circuit designers in general want tighter, smaller designs, which lead to higher performance and higher circuit density
...
Design rules are, consequently, a compromise that
attempts to satisfy both sides
...
fm Page 53 Friday, January 18, 2002 8:59 AM

Section 2
...
They consist of minimum-width and minimum-spacing
constraints and requirements between objects on the same or on different layers
...
It stands for the minimum mask dimension that can be safely transferred to the
semiconductor material
...
More advanced
approaches use electron-beam
...

Even for the same minimum dimension, design rules tend to differ from company to
company, and from process to process
...
One approach to address this issue is to use
advanced CAD techniques, which allow for migration between compatible processes
...
The latter approach, made popular by
Mead and Conway [Mead80], defines all rules as a function of a single parameter, most
often called λ
...
Scaling of the minimum dimension is accomplished by simply
changing the value of λ
...
For a given process, λ is set to a specific value, and all design dimensions are consequently translated into
absolute numbers
...
For
instance, for a 0
...
e
...
25 µm), λ
equals 0
...

This approach, while attractive, suffers from some disadvantages:
1
...
25 µm and 0
...
When scaling over larger ranges, the relations
between the different layers tend to vary in a nonlinear way that cannot be adequately covered by the linear scaling rules
...
Scalable design rules are conservative
...
This
results in over-dimensioned and less-dense designs
...
2 As circuit density is a prime goal in industrial designs, most semiconductor companies tend to use
micron rules, which express the design rules in absolute dimensions and can therefore
exploit the features of a given process to a maximum degree
...

For this textbook, we have selected a “vanilla” 0
...
The rest of this section is devoted to a short introduction
and overview of the design rules of this process, which fall in the micron-rules class
...


chapter2
...
We
discuss each of them in sequence
...

From a designer’s viewpoint, all CMOS designs are based on the following entities:
• Substrates and/or wells, being p-type (for NMOS devices) and n-type (for PMOS)
• Diffusion regions (n+ and p+) defining the areas where transistors can be formed
...
Diffusions of an inverse type are
needed to implement contacts to the wells or to the substrate
...

• One or more polysilicon layers, which are used to form the gate electrodes of the
transistors (but serve as interconnect layers as well)
...

• Contact and via layers to provide interlayer connections
...
The functionality of the circuit is determined by the choice of the layers, as well as
the interplay between objects on different layers
...
An interconnection between two metal layers is formed by a cross section between the two metal layers and an additional contact layer
...
The different layers used in our CMOS process are represented in Colorplate 1 (color insert)
...
All distances are expressed in µm
...

Interlayer Constraints
Interlayer rules tend to be more complex
...
Understanding layout requires the
capability of translating the two-dimensional picture of the layout drawing into the threedimensional reality of the actual device
...

We present these rules in a set of separate groupings
...
Transistor Rules (Colorplate 3)
...
From the intralayer design rules, it is already clear that
the minimum length of a transistor equals 0
...
3 µm (the minimum width of diffusion)
...
fm Page 55 Friday, January 18, 2002 8:59 AM

Section 2
...

2
...
A contact (which forms an interconnection between metal and active or polysilicon) or a via (which connects two metal
layers) is formed by overlapping the two interconnecting layers and providing a
contact hole, filled with metal, between the two
...
3 µm, while the polysilicon and diffusion layers have to extend
at least over 0
...
This sets the minimum
area of a contact to 0
...
44 µm
...
The figure, furthermore, points out the minimum spacings
between contact and via holes, as well as their relationship with the surrounding layers
...
For robust digital circuit design, it is
important for the well and substrate regions to be adequately connected to the supply voltages
...
It is therefore advisable to provide numerous substrate (well) contacts spread over
the complete region
...
This is enabled by
the select layer, which reverses the type of diffusion
...

Consider an n-well process, which implements the PMOS transistors into an n-type
well diffused in a p-type material
...
To invert the polarity of the
diffusion, an n-select layer is provided that helps to establish the n+ diffusions for the wellcontacts in the n-region as well as the n+ source and drain regions for the NMOS transistors in the substrate
...
Failing to do so will almost surely lead to a nonfunctional design
...
While design teams in
the past used to spend numerous hours staring at room-size layout plots, most of this task
is now done by computers
...
A number of layout tools even perform on-line DRC and check the design in the background during the
time of conception
...
1 Layout Example

An example of a complete layout containing an inverter is shown in Figure 2
...
To help
the visualization process, a vertical cross section of the process along the design center is
included as well as a circuit schematic
...
fm Page 56 Friday, January 18, 2002 8:59 AM

56

THE MANUFACTURING PROCESS

Chapter 2

VDD

GND
In

A′

A

Out
(a) Layout

A′

A
n
p-substrate
n+

Field
oxide

p+

(b) Cross section along A-A′
In

VDD

GND
Out
(c) Circuit diagram
Figure 2
...


It is left as an exercise for the reader to determine the sizes of both the NMOS and
the PMOS transistors
...
4

Packaging Integrated Circuits
The IC package plays a fundamental role in the operation and performance of a component
...

Finally, its also protects the die against environmental conditions such as humidity
...
This influence is getting more
pronounced as time progresses by the reduction in internal signal delays and on-chip
capacitance as a result of technology scaling
...


chapter2
...
4

Packaging Integrated Circuits

57

The search for higher-performance packages with fewer inductive or capacitive parasitics
has accelerated in recent years
...
This relationship
was first observed by E
...
This formula relates the number of input/output pins to the complexity of the circuit, as measured by the number of
gates
...
1)

where K is the average number of I/Os per gate, G the number of gates, β the Rent exponent, and P the number of I/O pins to the chip
...
1 and 0
...
Its value
depends strongly upon the application area, architecture, and organization of the circuit, as
demonstrated in Table 2
...
Clearly, microprocessors display a very different input/output
behavior compared to memories
...
1

Rent’s constant for various classes of systems ([Bakoglu90])
Application

β

K

Static memory

0
...
45

0
...
5

1
...
63

1
...
25

82

The observed rate of pin-count increase for integrated circuits varies between 8% to
11% per year, and it has been projected that packages with more than 2000 pins will be
required by the year 2010
...
It is useful for the circuit designer to be
aware of the available options, and their pros and cons
...

• Electrical requirements—Pins should exhibit low capacitance (both interwire and
to the substrate), resistance, and inductance
...
Observe that intrinsic integrated-circuit impedances are high
...
Mechanical reliability requires a good matching between the thermal properties of the die and the chip carrier
...


chapter2
...
While ceramics
have a superior performance over plastic packages, they are also substantially more
expensive
...
The least expensive plastic packaging can dissipate up to 1 W
...

Higher dissipation requires more expensive ceramic packaging
...
Even more extreme techniques
such as fans and blowers, liquid cooling hardware, or heat pipes, are needed for
higher dissipation levels
...
The increasing pin count
either requires an increase in the package size or a reduction in the pitch between the
pins
...

Packages can be classified in many different ways —by their main material, the
number of interconnection levels, and the means used to remove heat
...

2
...
1

Package Materials

The most common materials used for the package body are ceramic and polymers (plastics)
...
For instance, the ceramic Al2O3 (Alumina) conducts heat better than
SiO2 and the Polyimide plastic, by factors of 30 and 100 respectively
...

The disadvantage of alumina and other ceramics is their high dielectric constant, which
results in large interconnect capacitances
...
4
...
The die is
first attached to an individual chip carrier or substrate
...
These cavities provide ample room for many
connections to the chip leads (or pins)
...

Complex systems contain even more interconnect levels, since boards are connected
together using backplanes or ribbon cables
...
10
...

Interconnect Level 1 —Die-to-Package-Substrate
For a long time, wire bonding was the technique of choice to provide an electrical connection between die and package
...
Next, the chip pads are individually
connected to the lead frame with aluminum or gold wires
...
fm Page 59 Friday, January 18, 2002 8:59 AM

Section 2
...
10 Interconnect hierarchy in traditional IC
packaging
...
An example of wire bonding is
shown in Figure 2
...
Although the wire-bonding process is automated to a large degree,
it has some major disadvantages
...
11

Wire bonding
...
Wires must be attached serially, one after the other
...

2
...

Bonding wires have inferior electrical properties, such as a high individual inductance (5
nH or more) and mutual inductance with neighboring signals
...
Typical values of the parasitic inductances and
capacitances for a number of commonly used packages are summarized in Table 2
...

3
...

New attachment techniques are being explored as a result of these deficiencies
...
12a)
...
12b)
...

One possible approach is to use pressure connectors
...
fm Page 60 Friday, January 18, 2002 8:59 AM

60

THE MANUFACTURING PROCESS

Sprocket
hole
Film + Pattern

Chapter 2

Solder bump
Die

Test
pads

Substrate
Lead
frame

(b) Die attachment using solder bumps

Polymer film
(a) Polymer tape with imprinted wiring pattern

Figure 2
...


Table 2
...

Capacitance
(pF)

Inductance
(nH)

68-pin plastic DIP

4

35

68-pin ceramic DIP

7

20

256-pin grid array

1–5

2–15

Wire bond

0
...
1–0
...
01–0
...
The sprockets in
the film are used for automatic transport
...
The
printed approach helps to reduce the wiring pitch, which results in higher lead counts
...
For instance,
for a two-conductor layer, 48 mm TAB Circuit, the following electrical parameters hold: L
≈ 0
...
5 nH, C ≈ 0
...
3 pF, and R ≈ 50–200 Ω [Doane93, p
...

Another approach is to flip the die upside-down and attach it directly to the substrate
using solder bumps
...
13)
...
This can help address
the power- and clock-distribution problems, since the interconnect materials on the substrate (e
...
, Cu or Au) are typically of a better quality than the Al on the chip
...
A PC board is manufactured by stacking layers of copper and insu-

chapter2
...
4

Packaging Integrated Circuits

61

Die
Solder bumps
Interconnect
layers
Figure 2
...


lating epoxy glass
...
The package pins are inserted and electrical connection is
made with solder (Figure 2
...
The favored package in this class was the dual-in-line
package or DIP (Figure 2
...
The packaging density of the DIP degrades rapidly when
the number of pins exceeds 64
...
15b)
...


(a) Through-hole mounting

(b) Surface mount
Figure 2
...


The through-hole mounting approach offers a mechanically reliable and sturdy connection
...
For mechanical reasons, a minimum pitch of 2
...
Even under
those circumstances, PGAs with large numbers of pins tend to substantially weaken the
board
...

PGAs with large pin counts hence require extra routing layers to connect to the multitudes
of pins
...

Many of the shortcomings of the through-hole mounting are solved by using the
surface-mount technique
...
14b)
...
In addition, the elimination of the through-holes improves the mechanical strength
of the board
...
Not only is it cumbersome to mount a component on a board, but also
more expensive equipment is needed, since a simple soldering iron will not do anymore
...
Signal probing becomes hard or even impossible
...
Three of these packages are shown in Figure 2
...
fm Page 62 Friday, January 18, 2002 8:59 AM

62

THE MANUFACTURING PROCESS

Chapter 2

2
7
5

1
3

Bare die
DIP
PGA
Small-outline IC
Quad flat pack
PLCC
Leadless carrier

4

6

Figure 2
...


leadless chip carrier
...
3
...
3

Parameters of various types of chip carriers
...
54 mm

64

Pin grid array

2
...
27 mm

28

Leaded chip carrier (PLCC)

1
...
75 mm

124

Even surface-mount packaging is unable to satisfy the quest for evermore higher
pin-counts
...
An example of such a
packaging approach, called ceramic ball grid array (BGA), is shown in Figure 2
...
Solder bumps are used to connect both the die to the package substrate, and the package to the
board
...
A minimum pitch between solder balls
of as low as 0
...


chapter2
...
4

Packaging Integrated Circuits

Lid

Thermal
grease
Chip

63

Flip-chip
solder joints

Ceramic base
Board
(a)

Solder ball

(b)

Figure 2
...
The trend is toward reducing the number of levels
...
Eliminating one layer in the packaging hierarchy by mounting the die
directly on the wiring backplanes—board or substrate—offers a substantial benefit when
performance or density is a major issue
...

A number of the previously mentioned die-mounting techniques can be adapted to
mount dies directly on the substrate
...
The substrate itself can vary over a wide range of
materials, depending upon the required mechanical, electrical, thermal, and economical
requirements
...
Silicon has the advantage of presenting a perfect match in mechanical and thermal properties with respect to the die material
...
An example of an MCM module implemented using a silicon substrate
(commonly dubbed silicon-on-silicon) is shown in Figure 2
...
The module, which implements an avionics processor module and is fabricated by Rockwell International, contains
53 ICs and 40 discrete devices on a 2
...
2″ substrate with aluminum polyimide interconnect
...
The module itself has 180
I/O pins
...
For instance, a solder bump has an assorted capacitance and inductance of only 0
...
01 nH respectively
...


chapter2
...
17

Chapter 2

Avionics processor module
...


The dynamic power associated with the switching of the large load capacitances is simultaneously reduced
...
This technology requires some advanced manufacturing steps that make the process expensive
...
In the near future, this argument might become obsolete as
MCM approaches proliferate
...
4
...
A large number of failure mechanisms in ICs are accentuated by increased temperatures
...
To prevent failure, the temperature of the die must be kept within certain ranges
...
Military parts are more demanding and require a temperature range varying from –55° to 125°C
...

Standard packaging approaches use still or circulating air as the cooling medium
...

More expensive packaging approaches, such as those used in mainframes or supercomput-

chapter2
...
5

Perspective — Trends in Process Technology

65

ers, force air, liquids, or inert gases through tiny ducts in the package to achieve even
greater cooling efficiencies
...
>
As an example, a 40-pin DIP has a thermal resistance of 38 °C/W and 25 °C/W for
natural and forced convection of air
...
For comparison, the thermal resistance
of a ceramic PGA ranges from 15 ° to 30 °C/W
...
The increasing integration levels and circuit performance make this task
nontrivial
...
It provides a bound on the integration complexity and performance as a function of the thermal parameters
N G ∆T
------ ≤ -----t p θE

(2
...

Example 2
...
5 °C/W and E = 0
...
In
other words, the maximum number of gates on a chip, when all gates are operating simultaneously, must be less than 400,000 if the switching speed of each gate is 1 nsec
...


Fortunately, not all gates are operating simultaneously in real systems
...
For
instance, it was experimentally derived that the ratio between the average switching period
and the propagation delay ranges from 20 to 200 in mini- and large-scale computers
[Masaki92]
...
(2
...
Design approaches for low power
that reduce either E or the activity factor are rapidly gaining importance
...
5

Perspective — Trends in Process Technology
Modern CMOS processes pretty much track the flow described in the previous sections
although a number of the steps might be reversed, a single well approach might be followed, a grown field oxide instead of the trench approach might be used, or extra steps
such as LDD (Lightly Doped Drain) might be introduced
...
2)
...
fm Page 66 Friday, January 18, 2002 8:59 AM

66

THE MANUFACTURING PROCESS

Chapter 2

inserted between steps i and j of our process
...
Beyond these, it is our belief that no dramatic changes, breaking away from the
described CMOS technology, must be expected in the next decade
...
5
...
Process engineers are continuously evaluating alternative
options for the traditional ‘Aluminum conductor—SiO2 insulator’ combination that has
been the norm for the last decades
...
Copper has the advantage of have a resistivity that is substantially lower than Aluminum
...
Coating the copper with a buffer material such as Titanium Nitride, preventing the diffusion, addresses this problem, but
requires a special deposition process
...
18) uses a metallization approach that fills trenches etched into the insulator, followed by a chemical-mechanical polishing step
...

In addition to the lower resistivity interconnections, insulator materials with a lower
dielectric constant than SiO2 —and hence lower capacitance— have also found their way
into the production process starting with the 0
...

(a)

(b)

Figure 2
...
fm Page 67 Friday, January 18, 2002 8:59 AM

Section 2
...
The main difference lies in
the start material: the transistors are constructed in a very thin layer of silicon, deposited
on top of a thick layer of insulating SiO2 (Figure 2
...
The primary advantages of the SOI
process are reduced parasitics and better transistor on-off characteristics
...
Preparing a high quality SOI substrate at an economical cost was long the main
hindrance against a large-scale introduction of the process
...


Gate
tox
oxide

n+

tSi

p

Buried Oxide (BOX)

n+

oxide

tBox

p-substrate
(a)

(b)

Figure 2
...


2
...
2

In the Longer Term

Extending the life of CMOS technology beyond the next decade, and deeply below the
100 nm channel length region however will require re-engineering of both the process
technology and the device structure
...
While projecting what approaches will dominate in that era equals resorting to
crystal-ball gazing, one interesting development is worth mentioning
...
One way to
address this problem is to introduce extra active layers, and to sandwich them in-between
the metal interconnect layers (Figure 2
...
This enables us to position high density memory on top of the logic processors implemented in the bulk CMOS, reducing the distance
between computation and storage, and hence also the delay [Souri00]
...
fm Page 68 Friday, January 18, 2002 8:59 AM

68

THE MANUFACTURING PROCESS

Chapter 2

different layers
...

optical device

T3 - Optical I/O; MEMS
M6
M5
M4
M3

n+/p+

n+/p+

T2 - High Density Memory
M2
M1

n+/p+

n+/p+
Bulk

T1 - Logic

Figure 2
...
Extra
active layers (T*), implementing high density memory
and I/O, are sandwiched between the metal
interconnect layers (M*)
...
How to remove the dissipated heat
is one of the compelling questions
...
Yet, researchers are
demonstrating major progress, and 3D integration might very well be on the horizon
...

One alternative, called 2
...
Vias are
etched to electrically connect both chips after metallization
...
The major limitation of this technique is its lack of precision (best case alignment
+/- 2 µm), which restricts the inter-chip communication to global metal lines
...


2
...

• The manufacturing process of integrated circuits require a large number of steps,
each of which consists of a sequence of basic operations
...


chapter2
...
7

To Probe Further

69

• The optical masks forms the central interface between the intrinsics of the manufacturing process and the design that the user wants to see transferred to the silicon fabric
...

This design rules acts as the contract between the circuit designer and the process
engineer
...


2
...
An excellent overview of the state-of-the-art in CMOS manufacturing can be
found in the “Silicon VLSI Technology” book by J
...
Deal, and P
...
A visual overview of the different steps in the manufacturing process can be
found on the web at [Fullman99]
...


REFERENCES
[Allen99] D
...
, “A 0
...
8 V SOI 550 MHz PowerPC Microprocessor with Copper
Interconnects,” Proceedings IEEE ISSCC Conference, vol
...
438-439, February 1999
...
Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley,
1990
...
Doane, ed
...

[Franzon93] P
...

[Fullman99] Fullman Kinetics, “The Semiconductor Manufacturing Process”, http://www
...
com/semiconductors/semiconductors
...

[Geppert98] L
...
35, No
1, pp
...

[Landman71] B
...
Russo, “On a Pin versus Block Relationship for Partitions of
Logic Graphs,” IEEE Trans
...
C-20, pp
...

[Masaki92] A
...
1992
...
Mead and L
...

[Nagata92] M
...
27, no
...
465–472,
April 1992
...
fm Page 70 Friday, January 18, 2002 8:59 AM

70

THE MANUFACTURING PROCESS

Chapter 2

[Plummer00] J
...
Deal, and P
...

[Steidel83] C
...
551–598, 1983
...
J
...
Banerjee, A
...
C
...
213-220, June 2000
...
fm Page 103 Monday, September 6, 1999 1:44 PM

CHAPTER

4

THE WIRE
Determining and quantifying interconnect parameters
n
Introducing circuit models for interconnect wires
n
Detailed wire models for SPICE
n
Technology scaling and its impact on interconnect

4
...
2

A First Glance

4
...
3
...
3
...
5

The Transmission Line

SPICE Wire Models
4
...
1
4
...
2

4
...
3
...
4

Capacitance

4
...
5

Perspective: A Look into the Future

Inductance

Electrical Wire Models
4
...
1

The Ideal Wire

4
...
2

The Lumped Model

4
...
3

The Lumped RC model

4
...
4

The Distributed rc Line

103

chapter4
...
1 Introduction
Throughout most of the past history of integrated circuits, on-chip interconnect wires were
considered to be second class citizens that had only to be considered in special cases or
when performing high-precision analysis
...
The parasitics effects
introduced by the wires display a scaling behavior that differs from the active devices such
as transistors, and tend to gain in importance as device dimensions are reduced and circuit
speed is increased
...
This situation is
aggravated by the fact that improvements in technology make the production of everlarger die sizes economically feasible, which results in an increase in the average length of
an interconnect wire and in the associated parasitic effects
...


4
...
State-of-the-art processes offer multiple layers of Aluminum, and at least one layer of polysilicon
...
These wires appear in the schematic diagrams of electronic
circuits as simple lines with no apparent impact on the circuit performance
...
All three have multiple
effects on the circuit behavior
...
An increase in propagation delay, or, equivalently, a drop in performance
...
An impact on the energy dissipation and the power distribution
...
An introduction of extra noise sources, which affects the reliability of the circuit
...
This conservative approach is non-constructive and even
unfeasible
...
It is hence totally useless for today’ integrated circuits with their
s
millions of circuit nodes
...
The circuit behavior at a given circuit node is only determined
by a few dominant parameters
...

To achieve the latter, it is important that the designer has a clear insight in the parasitic wiring effects, their relative importance, and their models
...
1
...
fm Page 105 Monday, September 6, 1999 1:44 PM

Section 4
...
1 Schematic and physical views of wiring of bus-network
...


ter (or transmitters) to a set of receivers and is implemented as a link of wire segments of
various lengths and geometries
...
Be aware that the reality may be far more complex
...
2a
...
This is a necessity when the length of the wire becomes significantly larger than its width
...

Analyzing the behavior of this schematic, which only models a small part of the circuit, is slow and cumbersome
...

• Inductive effects can be ignored if the resistance of the wire is substantial — this is
for instance the case for long Aluminum wires with a small cross-section — or if the
rise and fall times of the applied signals are slow
...
2b)
...

Obviously, the latter problems are the easiest to model, analyze, and optimize
...
The
goal of this chapter is to present the reader the basic techniques to estimate the values of

chapter4
...
2 Wire models for the circuit of Figure 4
...
Model (a) considers most of the wire parasitics (with the
exception of interwire resistance and mutual inductance), while model (b) only considers capacitance
...


4
...
3
...
The task is complicated by the fact that the interconnect structure of contemporary integrated circuits is
three-dimensional, as was clearly demonstrated in the process cross-section of FIGURE
(CHAPTER 2)
...
Rather than getting lost
in complex equations and models, a designer typically will use an advanced extraction
tool to get precise values of the interconnect capacitances of a completed layout
...
Yet, some simple first-order models
come in handy to provide a basic understanding of the nature of interconnect capacitance
and its parameters, and of how wire capacitance will evolve with future technologies
...
3
...
fm Page 107 Monday, September 6, 1999 1:44 PM

Section 4
...
Under those circumstances, the total capacitance
of the wire can be approximated as1
ε di
c int = ------ WL
t di

(4
...
SiO is the dielectric material of
2
choice in integrated circuits, although some materials with lower permittivity, and hence
lower capacitance, are coming in use
...
ε is typically expressed as the product of two terms, orε = ε rε0
...
854 × 1012
F/m is the permittivity of free space, andεr the relative permittivity of the insulating
material
...
1 presents the relative permittivity of several dielectrics used in integrated circuits
...
(4
...

Table 4
...


εr

Material
Free space

1

Aerogels

~1
...
9

Glass-epoxy (PC board)

5

Silicon Nitride (Si3N4)

7
...
5

Silicon

11
...
3 Parallel-plate capacitance
model of interconnect wire
...


chapter4
...

while scaling technology, it is desirable to keep the cross-section of the wire W×H) as
(
large as possible — as will become apparent in a later section
...
As a result, we have over the
years witnessed a steady reduction in theW/H-ratio, such that it has even dropped below
unity in advanced processes
...
Under those circumstances, the parallel-plate model assumed above becomes inaccurate
...

This effect is illustrated in Figure 4
...
Presenting an exact model for this difficult geome-

cfringe
(a) Fringing fields

cpp
w

H

+

Figure 4
...
The model decomposes the
capacitance into two contributions: a
parallel-plate capacitance, and a fringing
capacitance, modeled by a cylindrical wire
with a diameter equal to the thickness of
the wire
...


try is hard
...
4b): a parallel-plate
capacitance determined by the orthogonal field between a wire of width and the ground
w
plane, in parallel with the fringing capacitance modeled by a cylindrical wire with a
dimension equal to the interconnect thicknessH
...

wεdi
2πε di
c wire = c pp + c fringe = ---------- + -------------------------t di
log ( t di ⁄ H )

(4
...

Numerous more accurate models (e
...
[Vdmeijs84]) have been developed over time, but
these tend to be substantially more complex, and defeat our goal of developing a conceptual understanding
...
5 plots the
value of the wiring capacitance as a function of W/H)
...
For ( /H) smaller than 1
...
fm Page 109 Monday, September 6, 1999 1:44 PM

Section 4
...
The fringing capacitance can
increase the overall capacitance by a factor of more than 10 for small line widths
...
In other words, the
capacitance is no longer a function of the width
...
5 Capacitance of interconnect wire as a function of W/H), including fringing-field effects (from
(
[Schaper83])
...


So far, we have restricted our analysis to the case of a single rectangular conductor
placed over a ground plane
...
Todays processes offer many more layers of interconnect, which are
packed quite densily in addition
...
This is illustrated in Figure 4
...
Each wire is not only coupled
to the grounded substrate, but also to the neighboring wires on the same layer and on adjacent layers
...
The main difference is that not all its capacitive components do terminate at the
grounded substrate, but that a large number of them connect to other wires, which have
dynamically varying voltage levels
...

In summary, interwire capacitances become a dominant factor in multi-layer interconnect structures
...
The increasing contribution of the
interwire capacitance to the total capacitance with decreasing feature sizes is best illustrated by Figure 4
...
In this graph,which plots the capacitive components of a set of parallel wires routed above a ground plane, it is assumed that dielectric and wire thickness are

chapter4
...
6 Capacitive coupling
between wires in interconnect
hierarchy
...
WhenW becomes smaller than 1
...


Figure 4
...
It consists of a
capacitance to ground and an inter-wire
capacitance (from [Schaper83])
...
25µm CMOS process are given in
Table 4
...
The process supports 1 layer of polysylicon and 5 layers of Aluminum
...
When placing
the wires over the thick field oxide that is used to isolate different transistors, use the “Field”
column in the table, while wires routed over the active area see a higher capacitance as seen in
the “Active” column
...
To obtain more

chapter4
...
3

Interconnect Parameters — Capacitance, Resistance, and Inductance

111

accurate results for actual structures, complex 3-dimensional models should be used that take
the environment of the wire into account
...
2 Wire area and fringe capacitance values for typical 0
...
The table
rows represent the top plate of the capacitor, the columns the bottom plate
...

Field

Active

Poly

41

Al1

Al2

Al3

Al4

57

Poly

88

Al1

30
40

47

54

Al2

13

15

17

36

25

27

29

45

Al3

8
...
4

10

15

41

18

19

20

27

49

Al4

6
...
8

7

8
...
2

5
...
4

6
...
1

14

38

12

12

12

14

19

27

52

54

Table 4
...
Observe that these
numbers include both the parallel plate and fringing components
...
For instance, a ground plane placed on a neighboring
layer terminates a large fraction of the fringing field, and effectively reduces the interwire
capacitance
...
On the other hand, the thick Al5 wires display the highest interwire capacitance
...
The supply rails are an example of the latter
...
3 Interwire capacitance per unit wire length for different interconnect layers of typical 0
...
The capacitances are expressed in aF/µm, and are for minimally-spaced wires
...
fm Page 112 Monday, September 6, 1999 1:44 PM

112

THE WIRE

Chapter 4

Example 4
...
The length of those
wires can be substantial
...
Consider an aluminum wire of 10
cm long and 1 µm wide, routed on the first Aluminum layer
...
2
...
1 × 106 µm2) × 30 aF/µm2= 3 pF
Fringing capacitance: 2 × (0
...

Suppose now that a second wire is routed alongside the first one, separated by only the
minimum allowed distance
...
3, we can determine that this wire will couple to
the first with a capacitance equal to
Cinter =(0
...
5 pF
which is almost as large as the total capacitance to ground!
A similar exercise shows that moving the wire to Al4 would reduce the capacitance to
ground to 3
...
65 pF area and 2
...
5 pF
...
3
...
The resistance of a rectangular conductor in the style of Figure 4
...
3)

where the constant ρ is the resistivity of the material (in Ω-m)
...
4
...
Unfortunately, it has a
large resistivity compared to materials such as Copper
...

Table 4
...


Material

ρ (Ω -m)

Silver (Ag)

1
...
fm Page 113 Monday, September 6, 1999 1:44 PM

Section 4
...
4

113

Resistivity of commonly-used conductors (at 20 C)
...
7 ×10 − 8

Gold (Au)

2
...
7 ×10 − 8

Tungsten (W)

5
...
(4
...
4)

ρ
Rq = --H

(4
...
This expresses that the resistance of a square conductor is independent of its absolute size, as is apparent from Eq
...
4)
...

Interconnect Resistance Design Data
Typical values of the sheet resistance of various interconnect materials are given in Table 4
...

Table 4
...
25µm CMOS process
...
05 – 0
...
Polysilicon should only be used for local interconnect
...

RC

chapter4
...
silicide is
A
a compound material formed using silicon and a refractory metal
...

Examples of silicides are WSi2, TiSi2, PtSi2, and TaSi
...
The silicides
are most often used in a configuration called apolycide, which is a simple layered combination of polysilicon and a silicide
...
A MOSFET fabricated with a polycide gate is shown in Figure 4
...
The advantage of the silicided gate is a reduced gate resistance
...

Silicide
Polysilicon
SiO2
n+

n+
p

Figure 4
...


A polycide-gate

Transitions between routing layers add extra resistance to a wire, called the
contact
resistance
...
It is possible to reduce the contact
resistance by making the contact holes larger
...
This effect, calledcurrent crowding, puts a
practical upper limit on the size of the contact
...
25 µm process: 5-20 Ω for metal or polysilicon to
n+, p+, and metal to polysilicon; 1-5Ω for via’ (metal-to-metal contacts)
...
2 Resistance of a Metal Wire
Consider again the aluminum wire of Example 4
...
Assuming a sheet resistance for Al1 of 0
...
075 Ω/q × (0
...
5 kΩ
Implementing the wire in polysilicon with a sheet resistance of 175 q raises the overall
Ω/
resistance to 17
...
Silicided polysilicon with a sheet resistance of 4 Ω/q offers a better alternative, but still translates into a wire with a 400 k resisΩ
tance
...
This is definitely the case for most semiconductor circuits
...
High-frequency currents tend to flow
primarily on the surface of a conductor with the current density falling off exponentially

chapter4
...
3

Interconnect Parameters — Capacitance, Resistance, and Inductance

115

with depth into the conductor
...
6)

with f the frequency of the signal andµ the permeability of the surrounding dielectric (typically equal to the permeability of free space, orµ = 4π × 10-7 H/m)
...
6 µm
...
9 for a rectangular wire
...
9 The skin-effect reduces the flow of the current to the
surface of the wire
...
7)

The increased resistance at higher frequencies may cause an extra attenuation — and
hence distortion — of the signal being transmitted over the wire
...
Belowfs the whole wire is conducting current, and the resistance is equal to (constant) low-frequency resistance of the wire
...
(4
...
8)

Example 4
...
7 10-8 Ω-m, embedded in a SiO2 dielectric
×
with a permeability of 4 × 10-7 H/m
...
(4
...
2 µm for the effect to be noticable at 1 GHz
...
10, which plots the increase in resistance
due to skin effects for different width Aluminum conductors
...
fm Page 116 Monday, September 6, 1999 1:44 PM

116

THE WIRE

Chapter 4

can be observed at 1 GHz for a 20 µm wire, while the increase for a 1µm wire is less than
1%
...
10 Skin-effect induced
increase in resistance as a function
of frequency and wire width
...
7 µm
[Sylvester97]
...
Since clocks tend to
carry the highest-frequency signals on a chip and also are fairly wide to limit resistance,
the skin effect is likely to have its first impact on these lines
...
Another major design concern is that the adoption of better
conductors such as Copper may move the on-set of skin-effects to lower frequencies
...
3
...
This was definitely the case in
the first decades of integrated digital circuit design
...
Consequences of on-chip inductance
include ringing and overshoot effects, reflections of signals due to impedance mismatch,
inductive coupling between lines, and switching noise due to
Ldi/dt voltage drops
...
9)

chapter4
...
3

Interconnect Parameters — Capacitance, Resistance, and Inductance

117

It is possible to compute the inductance a wire directly from its geometry and its
environment
...
10)

with ε and µ respectivily the permittivity and permeability of the surrounding dielectric
...
This is most often not the case
...
(4
...

Some other interesting relations, obtained from Maxwell’ laws, can be pointed out
...
11)

c0 equals the speed of light (30 cm/nsec) in a vacuum
...
6
...

Table 4
...

The relative permeability µr of most dielectrics is approximately equal to 1
...
9

15

PC board (epoxy glass)

5
...
5

10

Dielectric

Example 4
...
25 micron CMOS technology and routed on top
of the field oxide
...
2, we can derive the capacitance of the wire per unit length:
c = (W×30 + 2×40) aF/µm
From Eq
...
10), we can derive the inductance per unit length of the wire, assuming
SiO2 as the dielectric and assuming a uniform dielectric (make sure to use the correct units!)
l = (3
...
854 × 10

− 12

)×(4 π 10

−7

)/ C

For wire widths of 0
...
fm Page 118 Monday, September 6, 1999 1:44 PM

118

THE WIRE

Chapter 4

W = 0
...
47 pH/µm
W = 1 µm: c = 110 aF/µm; l = 0
...
11 pH/µm
Assuming a sheet resistance of 0
...
075/W Ω/µm
It is interesting to observe that the inductive part of the wire impedance becomes equal
in value to the resistive component at a frequency of 27
...
For wires with
a smaller capacitance and resistance (such as the thicker wires located at the upper interconnect layers), this frequency can become as low as 500 MHz, especially when better interconnect materials such as Copper are being used
...


4
...
These
parasitic elements have an impact on the electrical behavior of the circuit and influence its
delay, power dissipation, and reliability
...
These models vary from very simple to very complex depending upon
the effects that are being studied and the required accuracy
...

4
...
1

The Ideal Wire

In schematics, wires occur as simple lines with no attached parameters or parasatics
...
A voltage change at
one end of the wire propagates immediately to its other ends, even if those are some distance away
...

While this ideal-wire model is simplistic, it has its value, especially in the early phases of
the design process when the designer wants to concentrate on the properties and the
behavior of the transistors that are being connected
...

Taking these into account would just make the analysis unnecessarily complex
...


chapter4
...
4

4
...
2

Electrical Wire Models

119

The Lumped Model

The circuit parasitics of a wire are distributed along its length and are not lumped into a
single position
...
The
advantage of this approach is that the effects of the parasitic then can be described by an
ordinary differential equation
...

As long as the resistive component of the wire is small and the switching frequencies are in the low to medium range, it is meaningful to consider only the capacitive component of the wire, and to lump the distributed capacitance into a single capacitor as
shown in Figure 4
...
Observe that in this model the wire still represents an equipotential
region, and that the wire itself does not introduce any delay
...
This capacitive lumped model is simple, yet effective, and is the model of choice for the analysis of
most interconnect wires in digital integrated circuits
...
11 Distributed versus lumped capacitance model of wire
...
The driver is modeled as a voltage source and a source resistance driver
...
5 Lumped capacitance model of wire
For the circuit of Figure 4
...
In Example 4
...

The operation of this simple RC network is described by the following ordinary differential
equation:

C lumped

dV out V out – V in
+ ----------------------- = 0
R driver
dt

When applying a step input (with Vin going from 0 to V), the transient response of this circuit is
known to be an exponential function, and is given by the following expression (where =
τ
Rdriver×Clumped , the time constant of the network):
Vout(t) = (1 − e−t/τ) V
The time to reach the 50% point is easily computed ast = ln(2)τ = 0
...
Similarly, it takes t
= ln(9)τ = 2
...
It is worth memorizing these numbers, as they are extensively used in the rest of the text
...
fm Page 120 Monday, September 6, 1999 1:44 PM

120

THE WIRE

Chapter 4

t50% = 0
...
2 × 10 KΩ × 11 pF = 242 nsec
These numbers are not even acceptable for the lowest performance digital circuits
...


While the lumped capacitor model is the most popular, sometimes it is also useful to
present lumped models of a wire with respect to either resistance and inductance
...
Both the resistance and
inductance of the supply wires can be interpreted as parasitic noise sources that introduce
voltage drops and bounces on the supply rails
...
4
...
The equipotential assumption, presented in the lumped-capacitor model, is no longer adequate, and a
resistive-capacitive model has to be adopted
...
This simple
model, called the lumped RC model is pessimistic and inaccurate for long interconnect
wires, which are more adequately represented by adistributed rc-model
...
The behavior of the distributed rc-line can be adequately modeled by a simpleRC network
...
Having a means to analyze such
a network effectively and to predict its first-order response would add a great asset
to the designers tool box
...
5, we analyzed a single resistor-single capacitor network
...
Unfortunately, deriving the correct waveforms for a network with a larger number of capacitors
and resistors rapidly becomes hopelessly complex: describing its behavior requires a set of
ordinary differential equations, and the network now contains many time-constants (or
poles and zeros)
...

Consider the resistor-capacitor network of Figure 4
...
This circuit is called anRCtree and has the following properties:
• the network has a single input node (calleds in Figure 4
...
fm Page 121 Monday, September 6, 1999 1:44 PM

Section 4
...
12

Tree-structured RC network
...
The total resistance along
this path is called the path resistance Rii
...
12 equals
R44 = R1 + R3 + R4
The definition of the path resistance can be extended to address theshared path
resistance Rik, which represents the resistance shared among the paths from the root node
s
to nodes k and i:
R ik =

∑ R ⇒ ( R ∈ [ path ( s → i ) ∩ path ( s → k]) )
j

j

(4
...
12, Ri4 = R1 + R3 while Ri2 = R1
...
The Elmore delay at node i is then
given by the following expression:
N

τ Di =

∑C R
k

ik

(4
...
The designer should be aware that this time-constant
represents a simple approximation of the actual delay between source node and node Yet
i
...
It
offers the designer a powerful mechanism for providing a quick estimate of the delay of a
complex network
...
6 RC delay of a tree-structured network
Using Eq
...
13), we can compute the Elmore delay for nodei in the network of Figure 4
...

τDi = R1C1 + R1C2 + (R1 + R3)C3 + (R1 + R3)C4 + (R1 + R3 + Ri)Ci

chapter4
...
13
...
The Elmore delay of this chain network can be derived with the aid of Eq
...
13):
Vin

R1

1

R2

Ri–1

2

Ri

i–1

RN

i

N
VN

C1
Figure 4
...


N

τ DN =

i

∑C ∑R
i

i=1

j=1

N

j

=

∑C R
i

ii

(4
...
As an example,
consider node 2 in the RC chain of Figure 4
...
Its time-constant consists of two components contributed by nodes 1 and 2
...
The equivalent time constant at node2 equals C1R1 + C2(R1 + R2)
...

τDi = C1R1 + C2(R1 + R2) + … +Ci(R1 + R2 + … + Ri)
Example 4
...
13 can be used as an approximation of a resistive-capacitive wire
...

The resistance and capacitance of each segment are hence given byrL/N and cL/N, respectively
...
15)

with R (= rL) and C (= cL) the total lumped resistance and capacitance of the wire
...
Eq
...
15) then simplifies
to the following expression:
2

RC
rcL
τ DN = ------- = ----------2
2

(4
...
(4
...


chapter4
...
4

Electrical Wire Models

123

• The delay of the distributed rc-line is one half of the delay that would have been
predicted by the lumped RC model
...
(4
...
This confirms the observation made earlier
that the lumped model presents a pessimistic view on the delay of resistive wire
...

The Elmore expression determines the value of only the dominant one, and presents thus a
first-order approximation
...
Besides making it
possible to analyze wires, the formula can also be used to approximate the propagation
delay of complex transistor networks
...
The evaluation of the propagation delay is then
reduced to the analysis of the resultingRC network
...
These bounds have formed the base for most computer-aided timing analyzers at the switch and functional level [Horowitz83]
...

4
...
4

The Distributed rc Line

In the previous paragraphs, we have shown that the lumpedRC model is a pessimistic
model for a resistive-capacitive wire, and that a distributedrc model (Figure 4
...
As before, L represents the total length of the wire, whiler and c stand
for the resistance and capacitance per unit length
...
14b
...
17)

The correct behavior of the distributed rc line is then obtained by reducing∆L asymptotically to 0
...
(4
...
18)

where V is the voltage at a particular point in the wire, andx is the distance between this
point and the signal source
...
(4
...
These equations are difficult to use for ordinary circuit analysis
...
fm Page 124 Monday, September 6, 1999 1:44 PM

124

THE WIRE

r∆L

Vin

c∆L

Vi-1

r∆L

r∆L

Vi

c∆L

c∆L

r∆L

Vi+1

r∆L

c∆L

Chapter 4

Vout

c∆L

L
(a) Distributed model

(r,c,L)
Vin

Vout

(b) Schematic symbol for distributedRC line
Figure 4
...


which can be easily used in computer-aided analysis
...

RC
V out ( t ) = 2erfc ( ------- )
4t
= 1
...
366e

t
–2
...
19)
+ 0
...
4641 -------RC

t » RC

Figure 4
...
Observe how the step waveform “diffuses” from the start to the end of the wire, and the waveform rapidly degrades, resulting
in a considerable delay for long wires
...
fm Page 125 Monday, September 6, 1999 1:44 PM

Section 4
...
It hence will receive considerable attention in later chapters
...
5
x= L/10
2

voltage (V)

x = L/4
1
...
5

0
0

Figure 4
...
5

1

1
...
5
time (nsec)

3

3
...
5

5

Simulated step response of resistive-capacitive wire as a function of time and place
...
7
...
69 RC
...
38
RC, with R
and C the total resistance and capacitance of the wire
...

(4
...

Table 4
...


Voltage range

Lumped RC network

Distributed RC network

0 → 50% (tp)

0
...
38 RC

0 → 63% (τ)

RC

0
...
2 RC

0
...
3 RC

1
...
8 RC delay of Aluminum Wire
Let us consider again the 10 cm long, 1 µm wide Al1 wire of Example 4
...
In Example 4
...
075 Ω /µm;
Using the entry of Table 4
...
38 RC = 0
...
075 Ω /µm) × ( 110 aF/µm) × (105 µm)2 = 31
...
The values of the capacitances are obtained fromTable 4
...
0375 Ω /µm for Poly and Al5:

chapter4
...
38 × ( 150 Ω /µm) × ( 88 + 2 × 54 aF/µm) × (105 µm)2 = 112 µsec!
Al5: tp = 0
...
0375 Ω/µm) × ( 5
...
2 nsec
Obviously, the choice of the interconnect material and layer has a dramatic impact on the delay of
the wire
...
A simple rule of thumb proves to be very useful here
...

This translates into Eq
...
20), which determines the critical lengthL of the interconnect
wire where RC delays become dominant
...
38rc

(4
...


• rc delays should only be considered when the rise (fall) time at the line input is
smaller than RC, the rise (fall) time of the line
...
21)

with R and C the total resistance and capacitance of the wire
...


Example 4
...
16
...
The total propagation delay of the network can be approxi2
mated by the following expression, obtained by applying the Elmore formula:
Rw Cw
2
τ D = R s C w + ------------- = R s C w + 0
...
13 and apply the Elmore equation on the
resulting network
...
fm Page 127 Monday, September 6, 1999 1:44 PM

Section 4
...
16 rc-line of length L driven by source with
resistance equal to Rs
...
69 R s C w + 0
...
The delay introduced by the wire resistance becomes dominant
when (RwCw)/2 ≥ RsCw, or L ≥ 2Rs/r
...
075 Ω/µm)
...
67
cm
...
4
...
This is more precisely the case when the rise and fall times
of the signal become comparable to the time of flight of the signal waveform across the
line as determined by the speed of light
...

In this section, we first analyze the transmission line model
...

Transmission Line Model
Similar to the resistance and capacitance of an interconnect line, the inductance is distributed over the wire
...
The transmission
line has the prime property that a signal propagates over the interconnection medium as a
wave
...
(4
...
In the wave
mode, a signal propagates by alternatively transferring energy from the electric to the
magnetic fields, or equivalently from the capacitive to the inductive modes
...
17 at timet
...
fm Page 128 Monday, September 6, 1999 1:44 PM

128

THE WIRE

Vin

r

g
Figure 4
...


∂v
∂i
= – ri – l
∂x
∂t

(4
...
(4
...

2

2

∂v
= rc ∂v + lc ∂ v
∂t
∂ x2
∂t2

(4
...

To understand the behavior of the transmission line, we will first assume that the
resistance of the line is small
...
This model is applicable for wires at the
printed-circuit board level
...
On the other hand,
resistance plays an important role in integrated circuits, and a more complex model, called
the lossy transmission line should be considered
...

The Lossless Transmission Line
For the lossless line, Eq
...
23) simplifies to theideal wave equation:
2

2

2

∂v
∂v
1∂v
= lc 2 = ----- 2
∂x2
∂t
ν2 ∂t

(4
...
(4
...

c0
1
1
ν = -------- = ---------- = -------------lc
εµ
εrµ r

(4
...
The propagation
delay per unit wire length(tp) of a transmission line is the inverse of the speed:

chapter4
...
4

Electrical Wire Models

129

tp =

lc

(4
...
Suppose
that a voltage step V has been applied at the input and has propagated to pointx of the line
(Figure 4
...
All currents are equal to 0 at the right side ofx, while the voltage over the
line equals V at the left side
...
This requires the following current:
I =

dQ
= c dx V = cνV =
dt
dt

c
-V
l

(4
...
18 Propagation of
voltage step along a lossless
transmission line
...
This means that the signal sees
the remainder of the line as areal impedance,
V
Z 0 = --- =
I

l
εµ
1
- = ---------- = -----
...
28)

This impedance, called the characteristic impedance of the line, is a function of the
dielectric medium and the geometry of the conducting wire and isolator (Eq
...
28)), and
is independent of the length of the wire and the frequency
...
Typical values of the characteristic impedance of wires in semiconductor circuits
range from 10 to 200 Ω
...
10

Propagation Speeds of Signal Waveforms

The information of Table 4
...
5 nsec for a signal wave to propagate from
source-to-destination on a 20 cm wire deposited on an epoxy printed-circuit board
...
67 nsec for the
signal to reach the end of a 10 cm wire
...
The electro-magnetic fields in complex interconnect structures tend to be
irregular, and are strongly influenced by issues such as the current return path
...
fm Page 130 Monday, September 6, 1999 1:44 PM

130

THE WIRE

Chapter 4

analytical solutions are typically available
...
For some simplified structures,
approximative expressions have been derived
...
(4
...
(4
...

µr
2t + W
-Z 0 ( triplate) ≈ 94Ω --- ln  ---------------
ε r  H + W

(4
...
475ε r + 0
...
536W + 0
...
30)

and

Termination
The behavior of the transmission line is strongly influenced by the termination of the line
...
This is expressed by the reflection coefficient ρ that determines the relationship
between the voltages and currents of the incident and reflected waveforms
...
31)

where R is the value of the termination resistance
...

V = V inc ( 1 + ρ )
I = I inc ( 1 – ρ )

(4
...
19
...
The termination appears as an infinite extension of the line, and no waveform is reflected
...
In case (b), the line termination is
an open circuit (R = ∞), and ρ = 1
...
(4
...
Finally, in case (c) where the line termination
is a short circuit, R = 0, and ρ = –1
...

The transient behavior of a complete transmission line can now be examined
...
20
...
An incoming wave is completely reflected without phase reversal
...
21: RS = 5 Z0, RS = Z0, and RS = 1/5 Z0
...
fm Page 131 Monday, September 6, 1999 1:44 PM

Section 4
...
19

Behavior of various transmission line terminations
...
20 Transmission line
with terminating impedances
...
Large source resistance—RS = 5 Z0 (Figure 4
...
The amount injected is determined by the resistive divider formed by the source
resistance and the characteristic impedanceZ0
...
83 V

(4
...
67 V)
...
Approximately the same happens when
the wave reaches the source node again
...

5Z 0 – Z 0
2
ρ S = -------------------- = -5Z 0 + Z0
3

(4
...
The overall rise time is, however, many timesL/ν
...
fm Page 132 Monday, September 6, 1999 1:44 PM

132

THE WIRE

Chapter 4

5
4
Vdest

2

Vsource

V

3

1
(a) RS = 5Z0, RL = ×
0
4

V

3
2
1
(b) RS = Z0, RL = ×
0
8

V

6
4
2
(c) RS = Z0/5, RL = ×
00
Figure 4
...


15

When multiple reflections are present, as in the above case, keeping track of waves
on the line and total voltage levels rapidly becomes cumbersome
...
22)
...
The line voltage at a termination point equals the sum of the previous voltage, the incident, and reflected waves
...
Small source resistance—RS = Z0/5 (Figure 4
...
Its value is doubled at the destination end, which causes a severe overshoot
...
The signal bounces back and forth and exhibits severe ringing
...

3
...
21b)
Half of the input signal is injected at the source
...
It is obvious that this is

chapter4
...
4

Electrical Wire Models

133

Vsource

Vdest

0
...
8333
1
...
8333
2
...
5556
2
...
5556
+ 0
...
1482 V

t

3
...
3704
3
...
2469
4
...
2469

...
22 Lattice diagram forRS =
5 Z0 and RL = ∞
...
21a)
...
Matching the line impedance at the source end is called
series termination
...
In real conditions the signals are substantially smoother, as demonstrated
in the simulated response of Figure 4
...

10
Vdest

V (V)

8

Vsource

Vin

6
4
2
0

0

100

200

300

400

t (psec)

500

Figure 4
...


Problem 4
...
Also try the reverse picture— assume that the series resistance of the source equals zero,
and consider different load impedances
...
Matching the load impedance to the characteristic impedance
of the line once again results in the fastest response
...


chapter4
...

Example 4
...
One might wonder how this
influences the transmission line behavior and when the load capacitance should be taken into
account
...
From the load’ point of the view, the line behaves as a
s
resistance with value Z0
...
This is illustrated in Figure 4
...
The response shows how the output rises to its final value
with a time-constant of 100 psec (= 50 Ω × 2 pF) after a delay equal to the time-of-flight of
the line
...
After 2flight , an unexpected
t
voltage dip occurs at the source node that can be explained as follows
...
This reflected wave also approaches its final
value asymptotically
...
5 V rather than the expected 2
...
This forces the transmission line temporarily to 0 V, as shown in the simulation
...

5
...
0

V

3
...
0
1
...
0
0

0
...
2
0
...
4

0
...
24 Capacitively terminated
transmission line: RS = 50 Ω, RL = ∞, CL
= 2 pF, Z0 = 50 Ω, tflight = 50 psec
...
69Z0 CL = 69 psec)
...
In general, we can say that the capacitive load should only be considered in the analysis when its value is comparable to or larger than the total capacitance of
the transmission line [Bakoglu90]
...
fm Page 135 Monday, September 6, 1999 1:44 PM

Section 4
...
The lossy transmission-line model should be applied
instead
...
We
therefore only discuss the effects of resistive loss on the transmission line behavior in a
qualitative fashion
...
This is demonstrated in Figure 4
...
The step input still propagates as a wave through the line
...
35)

The arrival of the wave is followed by a diffusive relaxation to the steady-state value
at point x
...
In fact, the resistive effect becomes dominant, and the line
behaves as a distributed RC line when R ( = rL, the total resistance of the line) >> 2Z0
...
At that point, the
line is more appropriately modeled as a distributedrc line
...
25

Dest

x

t

t

Step response of lossy transmission line
...
For instance, branches on wires, often
called transmission line taps, cause extra reflections and can affect both signal shape and
delay
...
For a more extensive discussion of
these effects, we would like to refer the reader to [Bakoglu90] and [Dally98]
...
fm Page 136 Monday, September 6, 1999 1:44 PM

136

THE WIRE

Chapter 4

Design Rules of Thumb
Once again, we have to ask ourselves the question when it is appropriate to consider
transmission line effects
...

(
This leads to the following rule of thumb, which determines when transmission line
effects should be considered:

L
t r ( t f ) < 2
...
5 -ν

(4
...
At the board level, where wires can reach a length of
up to 50 cm, we should account for the delay of the transmission line whenr < 8 nsec
...
Ignoring the inductive component of the propagation delay can easily result in overly optimistic
delay predictions
...
37)

If this is not the case, the distributed RC model is more appropriate
...
5 lc

(4
...
12

(4
...
Using the data from Example 4
...
(4
...
1 µm: c = 92 aF/µm; Z0 = 74 Ω
W = 1
...
fm Page 137 Monday, September 6, 1999 1:44 PM

Section 4
...
(4
...
075Ω ⁄ µm
From Eq
...
36), we find a corresponding maximum rise (or fall) time of the input signal
equal to
trmax = 2
...
For these wires, a lumped capacitance
model is more appropriate
...
For a
10 µm wide wire, we find a maximum length of 11
...

Assume now a Copper wire, implemented on level 5, with a characteristic impedance
of 200 Ω and a resistance of 0
...
The resulting maximum wire length equals 40 mm
...

Be aware however that the values for Z0, derived in this example, are only approximations
...


Example 4
...
5 SPICE Wire Models
In previous sections, we have discussed the
4
...
1

Distributed rc Lines in SPICE

Because of the importance of the distributedrc-line in todays design, most circuit simulators have built-in distributed rc-models of high accuracy
...
This model
approximates the rc-line as a network of lumped RC segments with internally generated
nodes
...

Example 4
...
N1 and
N2 represent the terminal nodes of the line, while N3 is the node the capacitances are connected to
...

U1 N1=1 N2=2 N3=0 URCMOD L=50m N=6

...
fm Page 138 Monday, September 6, 1999 1:44 PM

138

THE WIRE

Chapter 4

If your simulator does not support a distributedrc-model, or if the computational
complexity of these models slows down your simulation too much, you can construct a
simple yet accurate model yourself by approximating the distributed by a lumped RC
rc
network with a limited number of elements
...
26 shows some of these approximations ordered along increasing precision and complexity
...
For instance, the error of theπ3 model is less than
3%, which is generally sufficient
...
5
...
26

R/4

C/2

π2
R/3

R/2

R/3

R/3

C/3

C/3

R/6
C/3

T3

Simulation models for distributedRC line
...
The line characteristics are defined by
the characteristic impedance Z0, while the length of the line can be defined in either of two
forms
...
Alternatively, a frequency may be given together withNL, the
F
dimensionless, normalized electrical length of the transmission line, which is measured
with respect to the wavelength in the line at the frequencyF
...

NL = F ⋅ TD

(4
...
fm Page 139 Monday, September 6, 1999 1:44 PM

Section 4
...
When necessary, loss can be
added by breaking up a long transmission line into shorter sections and adding a small
series resistance in each section to model the transmission line loss
...
First of all, the accuracy is still limited
...
For small transmission lines, this time step might be much
smaller than what is needed for transistor analysis
...
6 Perspective: A Look into the Future
Similar to the approach we followed for the MOS transistor, it is worthwhile to explore
how the wire parameters will evolve with further scaling of the technology
...

A straightforward approach is to scale all dimensions of the wire by the same factor
S as the transistors (ideal scaling)
...
It can be surmised that the length oflocal interconnections —
wires that connect closely grouped transistors — scales in the same way as these transistors
...
Examples of such wires are clock signals, and data and instruction buses
...
27 contains a
histogram showing the distribution of the wire lengths in an actual microprocessor design,
containing approximately 90,000 gate) [Davis98]
...


Figure 4
...


The average length of these long wires is proportional to the die size (or complexity)
of the circuit
...
In fact, the size of the
typical die (which is the square root of the die area) is increasing by 6% per year, doubling
about every decade
...
They are projected to reach 4 cm on the side by 2010!

chapter4
...
In our subsequent analysis, we will therefore
consider three models: local wires ( L = S > 1), constant length wires (SL = 1), and global
S
wires (SL = SC < 1)
...
This leads to the scaling behavior illustrated in Table
4
...
Be aware that this is only a first-order analysis, intended to look at overall trends
...

Table 4
...
A constant delay is predicted for local wires, while the delay of the global wires goes up with 50% per year (for
S
= 1
...
94)
...
This explains why wire delays are starting to play a predominant role in
todays digital integrated circuit design
...
This explains why other interconnect scaling techniques are attractive
...
The “constant resistance” model of
Table 4
...
While this approach
seemingly has a positive impact on the performance, it causes the fringing and interwire
capacitance components to come to the foreground
...

Table 4
...
fm Page 141 Monday, September 6, 1999 1:44 PM

Section 4
...
9

141

“Constant Resistance” Scaling of Wire Properties

Parameter

Relation

Local Wire

Constant Lenght

Global Wire

R

L/WH

1

S

S/SC

CR

L2/Ht

εc/S

εcS

εcS/SC2

This scaling scenario offers a slightly more optimistic perspective, assuming of
course that εc < S
...
To keep these delays from becoming excessive, interconnect technology has to be drastically improved
...
The other option is to differentiate between local and global wires
...
To address these
conflicting demands, modern interconnect topologies combine a dense and thin wiring
grid at the lower metal layers with fat, widely spaced wires at the higher levels, as is illustrated in Figure 4
...
Even with these advances, it is obvious that interconnect will play a
dominant role in bothg high-performance and low-energy circuits for years to come
...
28 Interconnect hierarchy of 0
...


substrate

4
...
The main goal is to identify
the dominant parameters that set the values of the wire parasitics (being capacitance, resistance, and inductance), and to present adequate wire models that will aid us in the further
analysis and optimization of complex digital circuits
...
fm Page 142 Monday, September 6, 1999 1:44 PM

142

THE WIRE

Chapter 4

4
...
A number of textbooks and reprint volumes have been published
...


REFERENCES
[Antognetti88] P
...
Masobrio (eds
...

[Banzhaf92] W
...
, Prentice Hall, 1992
...
Chen, CMOS Devices and Technology for VLSI , Prentice Hall, 1990
...
Getreu, “Modeling the Bipolar Transistor,” Tektronix Inc
...

[Gray69] P
...
Searle, Electronic Principles, John Wiley and Sons, 1969
...
Gray and R
...
, John
Wiley and Sons, 1993
...
Haznedar, Digital Microelectronics, Benjamin/Cummings, 1991
...
Hodges and H
...
,
McGraw-Hill, 1988
...
Howe and S
...

[Hu92] C
...
27, no
...

241–246, March 1992
...
Hu, “Future CMOS Scaling and Reliability,” IEEE Proceedings, vol
...
5, May
1993
...
Jensen et al
...
1–61, August 1991
...
Ko, “Approaches to Scaling,” in VLSI Electronics: Microstructure Science, vol
...
1–37, Academic Press, 1989
...
Muller and T
...
, John
Wiley and Sons, 1986
...
Nagel, “SPICE2: a Computer Program to Simulate Semiconductor Circuits,” Memo
ERL-M520, Dept
...
and Computer Science, University of California at Berkeley, 1975
...
Sedra and K
...
, Holt, Rinehart and Winston,
1987
...
Sheu, D
...
Ko, and M
...
SC-22, no
...

558–565, August 1987
...
Sze, Physics of Semiconductor Devices, 2nd ed
...

[Thorpe92] T
...

[Toh88] K
...
Koh, and R
...
23
...
4, pp 950–957, August 1988
...
fm Page 143 Monday, September 6, 1999 1:44 PM

Section 4
...
Tsividis, Operation and Modeling of the MOS Transistor, McGraw-Hill, 1987
...
Yamaguchi et al
...
Electron
...
35, no 8, pp
...

[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
9 Exercises and Design Problems

chapter5
...
1

Introduction

5
...
2

5
...
4
...
3

Evaluating the Robustness of the CMOS
Inverter: The Static Behavior

5
...
5
...
3
...
5
...
5
...
3
...
4

Switching Threshold

5
...
2

Robustness Revisited

5
...
4

Analyzing Power Consumption Using
SPICE

Performance of CMOS Inverter: The Dynamic
Behavior
5
...
1

144

Computing the Capacitances

5
...
fm Page 145 Monday, September 6, 1999 11:41 AM

Section 5
...
1 Introduction
The inverter is truly the nucleus of all digital designs
...
The electrical behavior of these complex circuits can be almost completely derived by extrapolating the results obtained for
inverters
...

In this chapter, we focus on one single incarnation of the inverter gate, being the
static CMOS inverter — or the CMOS inverter, in short
...
We analyze the gate with respect
to the different design metrics that were outlined in Chapter 1:
• cost, expressed by the complexity and area
• integrity and robustness, expressed by the static (or steady-state) behavior
• performance, determined by the dynamic (or transient) response
• energy efficiency, set by the energy and power consumption
From this analysis arises a model of the gate that will help us to identify the parameters of the gate and to choose their values so that the resulting design meets desired specifications
...

While this Chapter focuses uniquely on the CMOS inverter, we will see in the following Chapter that the same methodology also applies to other gate topologies
...
2 The Static CMOS Inverter — An Intuitive Perspective
Figure 5
...
Its operation is readily
understood with the aid of the simple switch model of the MOS transistor, introduced in
Chapter 3 (Figure 3
...
This leads to the
|
VDD

Vin

Vout
CL
Figure 5
...
VDD stands for the
supply voltage
...
fm Page 146 Monday, September 6, 1999 11:41 AM

146

THE CMOS INVERTER

Chapter 5

following interpretation of the inverter
...
This yields the equivalent circuit of Figure 5
...
A
direct path exists between Vout and the ground node, resulting in a steady-state value of 0
V
...
The equivalent circuit of Figure 5
...
The gate clearly functions as an
inverter
...
2
inverter
...
This results in high noise margins
...
Gates with this property are called
ratioless
...

• In steady state, there always exists a path with finite resistance between the output
and either VDD or GND
...
Typical values of the output resistance are in k range
...
Since the
input node of the inverter only connects to transistor gates, the steady-state input
current is nearly zero
...
So, although fan-out does not have any effect on the steady-state behavior, it
degrades the transient response
...
fm Page 147 Monday, September 6, 1999 11:41 AM

Section 5
...
The absence of
current flow (ignoring leakage currents) means that the gate does not consume any
static power
...
The
situation was very different in the 1970s and early 1980s
...
The lack of complementary devices (such as the NMOS and PMOS transistor) in such a technology makes
the realization of inverters with zero static power non-trivial
...

The nature and the form of the voltage-transfer characteristic (VTC) can be graphically deduced by superimposing the current characteristics of the NMOS and the PMOS
devices
...
It requires
a
that the I-V curves of the NMOS and PMOS devices are transformed onto a common coordinate set
...
The PMOSI-V relations can be translated into
this variable space by the following relations (the subscripts and p denote the NMOS
n
and PMOS devices, respectively):
I DSp = – I DSn
V GSn = V in ; V GSp = V in – V DD

(5
...
This procedure is outlined in Figure 5
...

I Dp

IDn

IDn

Vin = 0

Vin = 0

Vin = 1
...
5

VDSp

VDSp

VGSp = –1
VGSp = –2
...
3 Transforming PMOS I-V characteristic to a common coordinate set
(assuming VDD = 2
...


Vout

chapter5
...
5

PMOS

Vin = 2
...
5

Vin = 1
...
5

Vin = 1

Vin = 2

Vin = 0
...
5

Vin = 0

Vout
Figure 5
...
5 V)
...


The resulting load lines are plotted in Figure 5
...
For a dc operating points to be
valid, the currents through the NMOS and PMOS devices must be equal
...
A
number of those points (for Vin = 0, 0
...
5, 2, and 2
...
As
can be observed, all operating points are located either at the high or low output levels
...
This results from
the high gain during the switching transient, when both NMOS and PMOS are simultaneously on, and in saturation
...
All these observations translate into the VTC of Figure
5
...

NMOS off
PMOS res

2
...
5

NMOS sat
PMOS sat

0
...
5

1

1
...
5

Vin

Figure 5
...
4 ( DD = 2
...
For each
V
operation region, the modes of the transistors are
annotated — off, res(istive), or sat(urated)
...
This
response is dominated mainly by the output capacitance of the gate, L, which is comC

chapter5
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior

VDD

149

VDD

Rp

Vout

Vout

CL

CL

Rn

Vin = 0

Vin = VDD

(a) Low-to-high

(b) High-to-low

Figure 5
...


posed of the drain diffusion capacitances of the NMOS and PMOS transistors, the capacitance of the connecting wires, and the input capacitance of the fan-out gates
...
6)
...
6a)
...
In Example
4
...
Hence, a fast gate is built either by keeping the output capacitance
small or by decreasing the on-resistance of the transistor
...
Similar considerations are valid for the high-to-low
transition (Figure 5
...
The reader should
R
be aware that the on-resistance of the NMOS and PMOS transistor is not constant, but is a
nonlinear function of the voltage across the transistor
...
An in-depth analysis of how to analyze and optimize the
performance of the static CMOS inverter is offered in Section 5
...


5
...
It remains to determine the precise values of M, VIH, and VIL as well
V
as the noise margins
...
3
...
Its value can be
obtained graphically from the intersection of the VTC with the line given by in = Vout
V
(see Figure 5
...
In this region, both PMOS and NMOS are always saturated, since DS =
V
VGS
...
fm Page 150 Monday, September 6, 1999 11:41 AM

150

THE CMOS INVERTER

Chapter 5

sistors
...
We furthermore ignore the channelV
length modulation effects
...
2)

Solving for VM yields

VM

 V + V DSATn + r  V + V + V DSATp-------------------------------Tp
 Tn
 DD
k p V DSATp υ satp W p
2 
2 
= ----------------------------------------------------------------------------------------------------- = ---------------------- ------------------- (5
...
For large values
of VDD (compared to threshold and saturation voltages), Eq
...
3) can be simplified:
rV DD
V M ≈ -----------1+r

(5
...
(5
...
It is generally considered to be
desirable for VM to be located around the middle of the available voltage swing (or at
VDD/2), since this results in comparable values for the low and high noise margins
...
To move VM upwards, a larger value of r is
required, which means making the PMOS wider
...

From Eq
...
2), we can derive the required ratio of PMOS versus NMOS transistor
sizes such that the switching threshold is set to a desired valueVM
...

k′ n V DSATn ( V M – V Tn – V DSATn ⁄ 2 )
( W ⁄ L) p
-------------------= -------------------------------------------------------------------------------------------------( W ⁄ L) n
k′ p V DSATp ( V DD – V M + V Tp + V DSATp ⁄ 2 )

(5
...
1 Inverter switching threshold for long-channel devices, or low supply-voltages
...
When the PMOS and NMOS are long-channel devices, or when the supply voltage is low, velocity saturation does not occur VM-VT < VDSAT)
...
(5
...
Derive
...
6)

chapter5
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior

151

Design Technique — Maximizing the noise margins
When designing static CMOS circuits, it is advisable to balance the driving strengths of the
transistors by making the PMOS section wider than the NMOS section, if one wants to maximize the noise margins and obtain symmetrical characteristics
...
(5
...


Example 5
...
25µm CMOS process, is located in the middle
between the supply rails
...
7, and
assume a supply voltage of 2
...
The minimum size device has a width/length ratio of 1
...

With the aid of Eq
...
5), we find
–6
( W ⁄ L)p
115 × 10 ×
-------------------= ------------------------- 0
...
25 – 0
...
63 ⁄ 2 ) = 3
...
0
( 1
...
4 – 1
...
7 plots the values of switching threshold as a function of the PMOS/NMOS
ratio, as obtained by circuit simulation
...
4 for a 1
...
(5
...


An analysis of the curve of Figure 5
...
VM is relatively insensitive to variations in the device ratio
...
g
...
5) do not disturb the transfer characteristic that much
...
For the above example, setting the ratio to 3, 2
...
22 V, 1
...
13 V, respectively
...
8
1
...
6
1
...
3

V

M

(V)

1
...
2
1
...
9
0
...
7 Simulated inverter switching
threshold versus PMOS/NMOS ratio (0
...
5 V)

chapter5
...
8 Changing the inverter threshold can improve the circuit reliability
...
The effect of changing the Wp/Wn ratio is to shift the transient region of the VTC
...
This property can be very useful, as asymmetrical transfer characteristics are actually desirable in some designs
...
8
...
Passing this signal
through a symmetrical inverter would lead to erroneous values (Figure 5
...
This
can be addressed by raising the threshold of the inverter, which results in a correct
response (Figure 5
...
Further in the text, we will see other circuit instances where
inverters with asymetrical switching thresholds are desirable
...
5/0
...
To move the threshold to 1
...
Observe that
Figure 5
...

5
...
2

Noise Margins

dV
By definition, VIH and VIL are the operational points of the inverter where out = – 1
...

lytical expressions forVIH and VIL, these tend to be unwieldy and provide little insight in
what parameters are instrumental in setting the noise margins
...
9
...
The crossover with the VOH and the
VOL lines is used to define VIH and VIL points
...
fm Page 153 Monday, September 6, 1999 11:41 AM

Section 5
...
9 A piece-wise linear
approximation of the VTC simplifies the
derivation of VIL and VIH
...
This approach yields the following expressions for the width of the transition region IH - VIL, VIH, VIL, and the noise marV
gins NMH and NML
...
7)

NM L = V IL

These expressions make it increasingly clear that a high gain in the transition region is
very desirable
...

Remains us to determine the midpoint gain of the static CMOS inverter
...
It is apparent from Figure
5
...

The channel-length modulation factor hence cannot be ignored in this analysis — doing so
would lead to an infinite gain
...
8), valid around the switching threshold, with respect to in
...
8)

V DSATp
k p V DSATp  V in – V DD – V Tp – ---------------- ( 1 + λ p V out – λ p V DD ) = 0

2 
Differentiation and solving fordV out/dVin yields
dV out
k n V DSATn ( 1 + λ n V out ) + k p V DSATp ( 1 + λ p V out – λ p V )
= – --------------------------------------------------------------------------------------------------------------------------------DD
---------------------------------------------------------------(5
...
fm Page 154 Monday, September 6, 1999 11:41 AM

154

THE CMOS INVERTER

1 k n V DSATn + k p V DSATp
--------------------------------------------------g = – ----------------λ n – λp
ID ( V M )

Chapter 5

(5
...
The gain is almost
purely determined by technology parameters, especially the channel length modulation
...

Example 5
...
25µm CMOS technology designed with a PMOS/NMOS
ratio of 3
...
375 µm, L = 0
...
5)
...
25 V),
I D(V M) = 1
...
63 × ( 1
...
43 – 0
...
06 × 1
...
5
g = – ---------------------- × 115 × 10 × 0
...
5 × 3
...
0 = –27
...
5
...
06 + 0
...
2 V, VIH = 1
...
2
...
10 plots the simulated VTC of the inverter, as well as its derivative, the gain
...
The actual values ofVIL and VIH are 1
...
45 V,
respectively, which leads to noise margins of 1
...
05 V
...
(5
...
As observed in Figure 5
...
This reduced gain would yield values for IL and VIH of 1
...
33
V
V, respectively
...

The obtained expressions are however perfectly useful as first-order estimations as
well as means of identifying the relevant parameters and their impact
...
Low values of 2
...
3 kΩ were
k
observed, respectively
...


SIDELINE: Surprisingly (or not so surprisingly), the static CMOS inverter can also be
used as an analog amplifier, as it has a fairly high gain in its transition region
...
10b
...
Yet, this observation
can be used to demonstrate one of the major differences between analog and digital
design
...
fm Page 155 Monday, September 6, 1999 11:41 AM

Section 5
...
5

155

0
-2

2

-4
-6

1
...
5

-16
0
0

0
...
5

2

2
...
5

V (V)
in

Figure 5
...
5 V)
...
5

2

2
...
25 CMOS, VDD
µm

device in the regions of extreme nonlinearity, resulting in well-defined and well-separated
high and low signals
...
2 Inverter noise margins for long-channel devices

Derive expressions for the gain and noise margins assuming that PMOS and NMOS are
long-channel devices (or that the supply voltage is low), so that velocity saturation does
not occur
...
3
...
Fortunately, the dc-characteristics
of the static CMOS inverter turn out to be rather insensitive to these variations, and the
gate remains functional over a wide range of operating conditions
...
7, which shows that variations in the device sizes have only a minor
impact on the switching threshold of the inverter
...
Two corner-cases are plotted in
Figure 5
...
Comparing the resulting curves with the nominal response shows that
the variations mostly cause a shift in the switching threshold, but that the operation of the

chapter5
...
5

Good PMOS
Bad NMOS

2

(V)

1
...
5

0
0

Good NMOS
Bad PMOS

0
...
5

2

2
...
11 Impact of device variations on static CMOS
inverter VTC
...
The opposite
is true for the “bad” transistor
...
This robust behavior that ensures functionality of the gate
over a wide range of conditions has contributed in a big way to the popularity of the static
CMOS gate
...
At the same time, device threshold voltages are virtually kept constant
...
Do inverters keep on working
when the voltages are scaled and are there potential limits to the supply scaling?
A first hint on what might happen was offered in Eq
...
10), which indicates that the
gain of the inverter in the transition region actually increases with a reduction of the supply voltage! Note that for a fixed transistor ratio VM is approximately proportional to
r,
VDD
...
12a)
...
5 V —
which is just 100 mV above the threshold of the transistors — the width of the transition
region measures only 10% of the supply voltage (for a maximum gain of 35), while it widens to 17% for 2
...
So, given this improvement in dc characteristics, why do we not
choose to operate all our digital circuits at these low supply voltages? Three important
arguments come to mind:
• In the following sections, we will learn that reducing the supply voltage indiscriminately has a positive impact on the energy dissipation, but is absolutely detrimental
to the performance on the gate
...

• Scaling the supply voltage means reducing the signal swing
...


chapter5
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior

2
...
2

2

0
...
5

1

0
...
05
0
...
5

1

1
...
5

0
0

V (V)
in

(a) Reducing VDD improves the gain
...
12

0
...
1
V (V)

0
...
2

in

(b) but it detoriates for very-low supply voltages
...


VTC of CMOS inverter as a function of supply voltage (0
...


To provide an insight into the question on potential limits to the voltage scaling, we
have plotted in Figure 5
...
Amazingly enough, we still obtain an inverter characteristic,
this while the supply voltage is not even large enough to turn the transistors on! The explanation can be found in the sub-threshold operation of the transistors
...
The very low value of the switching currents ensures a
very slow operation but this might be acceptable for some applications (such as watches,
for example)
...
VOL and VOH are no longer at the supply rails and the transition-region gain approaches
1
...
To achieving sufficient gain for
use in a digital circuit, it is necessary that the supply must be at least a couple timesΤ =
φ
kT/q (=25 mV at room temperature), the thermal voltage introduced in Chapter 3
[Swanson72]
...

kT
V DDmin > 2…4 ----q

(5
...
(5
...
It suggests that the only way to
get CMOS inverters to operate below 100 mV is to reduce the ambient temperature, or in
other words to cool the circuit
...
3 Minimum supply voltage of CMOS inverter

Once the supply voltage drops below the threshold voltage, the transistors operate the subthreshold region, and display an exponential current-voltage relationship (as expressed in
Eq
...
40))
...
fm Page 158 Monday, September 6, 1999 11:41 AM

158

THE CMOS INVERTER

Chapter 5

(assume symmetrical NMOS and PMOS transistors, and a maximum gain at M = VDD/2)
...

1 V ⁄ 2φ
g = –  -- ( e DD T – 1 )
 n

(5
...
5 and φT = 25
V
mV)
...
4 Performance of CMOS Inverter: The Dynamic Behavior
The qualitative analysis presented earlier concluded that the propagation delay of the
CMOS inverter is determined by the time it takes to charge and discharge the load capacitor CL through the PMOS and NMOS transistors, respectively
...
It is hence worthwhile to first study the major components of the load
capacitance before embarking onto an in-depth analysis of the propagation delay of the
gate
...

5
...
1

Computing the Capacitances

Manual analysis of MOS circuits where each capacitor is considered individually is virtually impossible and is exacerbated by the many nonlinear capacitances in the MOS transistor model
...
Be aware that this is
a considerable simplification of the actual situation, even in the case of a simple inverter
...
13

M4

Cw

Cg3

M3

Parasitic capacitances, influencing the transient behavior of the cascaded inverter pair
...
fm Page 159 Monday, September 6, 1999 11:41 AM

Section 5
...
13 shows the schematic of a cascaded inverter pair
...
It is initially assumed that the
input Vin is driven by an ideal voltage source with zero rise and fall times
...

Gate-Drain Capacitance Cgd12
M1 and M2 are either in cut-off or in the saturation mode during the first half (up to 50%
point) of the output transient
...
The channel capacitance of the MOS
transistors does not play a role here, as it is located either completely between gate and
bulk (cut-off) or gate and source (saturation) (see Chapter 3)
...
This is accomplished by taking the so-called Miller
effect into account
...
14)
...
To present an identical load to the output node, the capacitance-to-ground must have a value that is twice as
large as the floating capacitance
...
For an in-depth
discussion of the Miller effect, please refer to textbooks such as Sedra and Smith
([Sedra87], p
...
1
Cgd1

∆V

Vout

Vout
∆V

Vin
M1

∆V
2Cgd1

∆V

M1
Vin

Figure 5
...


Diffusion Capacitances Cdb1 and Cdb2
The capacitance between drain and bulk is due to the reverse-biased
pn-junction
...

We argued in Chapter 3 that the best approach towards simplifying the analysis is to
replace the nonlinear capacitor by a linear one with the same change in charge for the voltage range of interest
...

1

The Miller effect discussed in this context is a simplified version of the general analog case
...


chapter5
...
13)

with Cj0 the junction capacitance per unit area under zero-bias conditions
...
(3
...
14)

with φ0 the built-in junction potential and m the grading coefficient of the junction
...

Example 5
...
5 V CMOS Inverter
Consider the inverter of Figure 5
...
25 CMOS technology
...
5
...
13)
...
For the CMOS
inverter, this is the time-instance whereVout reaches 1
...
5 V
...
5 V, 1
...
25 V} for the low-to-high
transition
...
5 V
...
5 V
over the drain junction or Vhigh = −2
...
At the 50% point, Vout = 1
...
25 V
...
(5
...
5, φ0 = 0
...
57,
Sidewall: Keqsw (m = 0
...
9) = 0
...
25 V, respectively,
resulting in higher values forKeq,
Bottom plate: Keq (m = 0
...
9) = 0
...
44, φ0 = 0
...
81
The PMOS transistor displays a reverse behavior, as its substrate is connected to 2
...

Hence, for the high-to-low transition ( low = 0, Vhigh = −1
...
48, φ0 = 0
...
79,
Sidewall: Keqsw (m = 0
...
9) = 0
...
25 V, Vhigh = −2
...
48, φ0 = 0
...
59,
Sidewall: Keqsw (m = 0
...
9) = 0
...
The result of the linearization is a minor distortion of the voltage waveforms
...


chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

161

Wiring Capacitance Cw
The capacitance due to the wiring depends upon the length and width of the connecting
wires, and is a function of the distance of the fanout from the driving gate and the number
of fanout gates
...

Gate Capacitance of Fanout Cg3 and Cg4
We assume that the fanout capacitance equals the total gate capacitance of the loading
gates M3 and M4
...
15)

This expression simplifies the actual situation in two ways:
• It assumes that all components of the gate capacitance are connected between out
V
and GND (or VDD), and ignores the Miller effect on the gate-drain capacitances
...

• A second approximation is that the channel capacitance of the connecting gate is
constant over the interval of interest
...
The total channel capacitance is a function of the operation mode of the
device, and varies from approximately 1/3 of
WLCox (cut-off) over 2/3 WLCox (saturation) to the full WLCox (linear)
...
Ignoring the capacitance variation
results in a pessimistic estimation with an error of approximately 10%, which is
acceptable for a first order analysis
...
4 Capacitances of a 0
...
25 CMOS techµm
nology
...
15
...
5 V
...
This data is summarized
in Table 5
...
As an example, we will derive the drain area and perimeter for the NMOS transistor
...
This results in a
total area of 19 λ2, or 0
...
125 µm)
...
125 = 1
...
Notice that the gate side of the drain perimeter is not
included, as this is not considered a part of the side-wall
...
7 µm2; PD = 5 + 9 + 5 = 19 λ, or 2
...


chapter5
...
25 µm = 2λ

Out
In
Metal 1

Polysilicon

NMOS
(3λ/2λ)

GND

Figure 5
...


Table 5
...

W/L

AD (µm2)

PD (µm)

AS (µm2)

PS (µm)

NMOS

0
...
25

PMOS

1
...
25

0
...
875 (15λ)

0
...
875 (15λ)

0
...
375 (19λ)

0
...
375 (19λ)

2

2

This physical information can be combined with the approximations derived above to
come up with an estimation of CL
...
5, and repeated here for convenience:
Overlap capacitance: CGD0(NMOS) = 0
...
27 fF/µm
µ
Bottom junction capacitance: CJ(NMOS) = 2 fF/ m2; CJ(PMOS) = 1
...
28 fF/ CJSW(PMOS) = 0
...
A layout extraction program typically

chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

163

will deliver us precise values for this parasitic capacitance
...
With the aid of the interconnect parameters of Table 4
...
Due to the short length of the wire, this contribution is ignorable compared to the other parasitics
...
12 fF
Bringing all the components together results in Table 5
...
We use the values of eq
K
derived in Example 5
...
Notice that the load
capacitance is almost evenly split between its two major components: the intrinsic capacitance, composed of diffusion and overlap capacitances, and the extrinsic load capacitance,
contributed by wire and connecting gate
...
2

Components of CL (for high-to-low and low-to-high transitions)
...
23

0
...
61

0
...
66

0
...
5

1
...
76

0
...
28

2
...
12

0
...
4
...
1

6
...
This results in the expression of Eq
...
16)
...
16)

v1

with i the (dis)charging current, v the voltage over the capacitor, andv1 and v2 the initial
and final voltage
...
We rather fall back to the simplified switch-model of the
inverter introduced in Figure 5
...
The voltage-dependencies of the on-resistance and the
load capacitor are addressed by replacing both by a constant linear element with a value
averaged over the interval of interest
...
fm Page 164 Monday, September 6, 1999 11:41 AM

164

THE CMOS INVERTER

Chapter 5

for the load capacitance
...
8, and is repeated here for convenience
...
17)

2
V DSAT

with I DSAT = k' W  ( V DD – V T )V DSAT – -----------------
2 
L

Deriving the propagation delay of the resulting circuit is now straightforward, and is
nothing more than the analysis of a first-order linear
RC-network, identical to the exercise
of Example 4
...
There, we learned that the propagation delay of such a network for a voltage step at the input is proportional to the time-constant of the network, formed by pulldown resistor and load capacitance
...
69 R eqn C L

(5
...
69R eqp C L

(5
...

This analysis assumes that the equivalent load-capacitance is identical for both the highto-low and low-to-high transitions
...
The overall propagation delay of the inverter is
defined as the average of the two values, or
t pHL + t pLH
t p = -------------------------=
2

R eqn + R eqp
0
...
20)

Very often, it is desirable for a gate to have identical propagation delays for both rising
and falling inputs
...
Remember that this condition is identical to the
requirement for a symmetrical VTC
...
5 Propagation Delay of a 0
...
15, we make use of Eq
...
18) and Eq
...
19)
...
4, while
the equivalent on-resistances of the transistors for the generic 0
...
3
...
5 V, the normalized on-resistances of NMOS
and PMOS transistors equal 13 kΩ and 31 kΩ, respectively
...
5 for the NMOS , and 4
...
We
assume that the difference between drawn and effective dimensions is small enough to be
ignorable
...
fm Page 165 Monday, September 6, 1999 11:41 AM

Section 5
...
5

Vout

2

V

out

(V)

1
...
5

Figure 5
...
15
...
5

0

0
...
5

2

2
...
69 ×  ------------- × 6
...
5 31k Ω
t pLH = 0
...
0fF = 29 psec
 4
...
5 psec
 -----------------2 
The accuracy of this analysis is checked by performing a SPICE transient simulation
on the circuit schematic, extracted from the layout of Figure 5
...
The computed transient
response of the circuit is plotted in Figure 5
...
9 psec and 31
...
The manual results are good
considering the many simplifications made during their derivation
...
These are caused by the gate-drain capacitances
of the inverter transistors, which couple the steep voltage step at the input node directly to the
output before the transistors can even start to react to the changes at the input
...


WARNING: This example might give the impression that manual analysis always leads
to close approximations of the actual response
...
Large
deviations can often be observed between first- and higher-order models
...
A detailed simulation is indispensable when quantitative data is
required
...


chapter5
...
To provide an answer to this question, it is necessary
to make the parameters governing the delay explicit by expanding eq in the delay equaR
tion
...
(5
...
(5
...
69 -- ----------------= 0
...
21)
4 I DSATn
( W ⁄ L ) n k′ n V DSATn ( V DD – V Tn – V DSATn ⁄ 2 )
In the majority of designs, the supply voltage is chosen high enough so that DD >> VTn +
V
VDSATn/2
...
(5
...
Observe that this is a first-order approximation, and that increasing
the supply voltage yields an observable, albeit small, improvement in performance due to
a non-zero channel-length modulation factor
...
52 -------------------------------------------( W ⁄ L ) n k′ n V DSATn

(5
...
17, which plots the propagation delay of the
inverter as a function of the supply voltage
...
27, which charts the equivalent on-resistance
of the MOS transistor as a function of VDD
...
5
5
4
...
5
3
2
...
5
1
0
...
2

1
...
6

V

1
...
2

2
...
17 Propagation delay of CMOS
inverter as a function of supply voltage (
normalized with respect to the delay at 2
...
The dots indicate the delay values
predicted by Eq
...
21)
...
Hence, the deviation at
low supply voltages
...
This operation region should clearly be avoided if achieving high performance is a
premier design goal
...
fm Page 167 Monday, September 6, 1999 11:41 AM

Section 5
...
Remember that three major factors contribute to the load capacitance: the
internal diffusion capacitance of the gate itself, the interconnect capacitance, and the fanout
...

Good
design practice requires keeping the drain diffusion areas as small as possible
...
This is the most powerful and effective performance optimization tool in the hands of the designer
...
Increasing the transistor size also raises the diffusion
capacitance and hence CL
...
e
...
This effect is called “
self-loading”
...

• Increase VDD
...
17, the delay of a gate can be modulated by
modifying the supply voltage
...
However, increasing the supply voltage above a certain level yields only very minimal improvement and hence
should be avoided
...


Example 5
...
5
...
An insight in the potential improvement can be obtained by partitioning the load
capacitance into an intrinsic (diffusion and miller) and an extrinsic (wiring and fanout) component, or

C L = C int + C ext = C int ( 1 + α )

(5
...
Widening both NMOS and
PMOS of the driving inverter with a factorS reduces their equivalent resistance by an identical factor, but also raises the intrinsic capacitance of the gate by approximately the same ratio
...
69 ( S + α )C int  --------------------------- =  1 + --- t p0



2S
S

(5
...
Making S infinitially large yields the maximum obtainable performance gain, equal to 1/(1+ Yet, any sizα)
...

For the example in question, we find from Table 5
...
05 (Cint = 3
...
15 fF)
...
05
...


chapter5
...
8

x 10

3
...
4

p

t (sec)

3
...
8
2
...
4
2
...
18 Increasing inverter performance by
sizing the NMOS and PMOS transistor with an
identical factor S for a fixed fanout (inverter of
Figure 5
...


mance improvement of 1
...
3 psec)
...
18, we observe that
the bulk of the improvement is already obtained forS = 5, and that sizing factors larger than
10 barely yield any extra gain
...
4 Propagation Delay as a Function of (dis)charge Current
So far, we have expressed the propagation delay as a function of the equivalent resistance of
the transistors
...
Derive an expression of the propagation delay using this alternative approach
...
4
...

s,
Impact of Fanout
Eq
...
23) states that the load capacitance of the inverter can be divided into an intrinsic
and an extrinsic component
...
Assuming that each fanout gate
2
presents an identical load, and that the wiring capacitance is proportional to the fanout,
we can rewrite the delay equation as a function of the fanoutN
...
25)

The linear relationship between fanout and wiring capacitance has been confirmed by a number of heuristic studies [REF]
...
fm Page 169 Monday, September 6, 1999 11:41 AM

Section 5
...
Large fanout factors should hence be avoided if performance is an issue
...

NMOS/PMOS Ratio
So far, we have consistently widened the PMOS transistor so that its resistance matches
that of the pull-down NMOS device
...
5 between
PMOS and NMOS width
...
However, this does not imply that this ratio also yields the minimum overall propagation delay
...
When two contradictory effects are present, there must exist
a transistor ratio that optimizes the propagation delay of the inverter
...
Consider
two identical, cascaded CMOS inverters
...
26)

where Cdp1 and Cdn1 are the equivalent drain diffusion capacitances of PMOS and NMOS
transistors of the first inverter, whileCgp2 and Cgn2 are the gate capacitances of the second
gate
...

When the PMOS devices are madeβ times larger than the NMOS ones ( = (W/L)p /
β
(W/L)n), all transistor capacitances will scale in approximately the same way, or dp1 ≈ β
C
Cdn1, and Cgp2 ≈ β Cgn2
...
(5
...
27)

An expression for the propagation delay can be derived, based on Eq
...
20)
...
69 ( ( 1 + β ) ( C dn 1 + C gn 2 ) + C W )  R eqn + ------------------
β 
2
r
= 0
...
28)

r (= Reqp/Reqn) represents the resistance ratio of identically-sized PMOS and NMOS transistors
...
29)

chapter5
...
If the wiring
capacitance dominates, larger values ofβ should be used
...

Example 5
...
From the values of the equivalent resitances
(Table 3
...
4 (= 31 kΩ / 13 kΩ) would yield a symmetrical transient response
...
(5
...
6
...
19, which plots the simulated propagation delay as a function of the transistor ratioβ
...
The optimum point occurs around β = 1
...
Observe also that the rising and falling delays are identical at the
predicted point of β equal to 2
...

-11

x 10

5

tpLH

tpHL

tp

4

p

t (sec)

4
...
5

3

1

1
...
5

3

3
...
5

5

Figure 5
...


β


...
Only one of the devices is assumed
to be on during the (dis)charging process
...
This affects the
total current available for (dis)charging and impacts the propagation delay
...
20
plots the propagation delay of a minimum-size inverter as a function of the input signal
slope— as obtained from SPICE
...

While it is possible to derive an analytical expression describing the relationship
between input signal slope and propagation delay, the result tends to be complex and of
limited value
...
If the latter would be infinitely strong, its output slope would be
zero, and the performance of the gate under examination would be unaffected
...
fm Page 171 Monday, September 6, 1999 11:41 AM

Section 5
...
4

x 10

171

-11

5
...
8
4
...
4
4
...
8
3
...
20 tp as a function of the
input signal slope (10-90% rise or
fall time) for minimum-size
inverter with fan-out of a single
gate
...
This leads to a revised expression for the propagation delay of an
inverter i in a chain of inverters [Hedenstierna87]:
i

i

i–1

t p = t step + ηt step

(5
...
(5
...
e
...
The fraction η is an empirical constant
...

Example 5
...
All inverters in this example are assumed to be identical,
and to have an intrinsic propagation delaytp0
...
(5
...
(5
...

i

t p = t p0 ( 1 + αN ) + η t p0 ( 1 + αM )
= t p0 ( 1 + η + α ( N + ηM ) )

(5
...
Typical values for the
parameters α and η are around 1 and 0
...
Experiments have demonstrated that
the model of Eq
...
31) forms a good approximation of the actual dependencies, although
some important deviations can be observed for small values of and M
...
fm Page 172 Monday, September 6, 1999 11:41 AM

172

THE CMOS INVERTER

…M

i-1

…N

Figure 5
...
M and N denote the fanout factors of
inverter i-1 and i, respectively
...
This proves to be true not only for performance, but also for power consumption considerations as will be discussed later
...


Problem 5
...
Explain your answer
...
When gates get farther apart, the wire capacitance and resistance can no longer be ignored, and may even
dominate the transient response
...
The analysis detailed in Example 4
...
Consider the circuit of Figure 5
...
The driver is represented by a single resistance dr, wich is the
R
average between Reqn and Reqp
...

(rw,cw,L)

Vout
Cint

Vout

Cfan

Figure 5
...


The propagation delay of the circuit can be obtained by applying the Ellmore delay
expression
...
fm Page 173 Monday, September 6, 1999 11:41 AM

Section 5
...
69R dr C int + ( 0
...
38 R w )C w + 0
...
69R dr ( C int + C fan ) + 0
...
38r w c w L

2

(5
...
38 factor accounts for the fact that the wire represents a distributed delay
...
The delay
expressions contains a component that is linear with the wire length, as well a quadratic
one
...

Example 5
...
22, and assume the device parameters of Example 5
...
5(13/1
...
5) = 7
...
The wire is implemented in metal1
and has a width of 0
...
This yields the following parameters: =
cw
92 aF/µm, and rw = 0
...
4)
...
(5
...
Solving the following quadratic equation yields a single useful
solution
...
6 × 10

– 18 2

L + 0
...
29 × 10

– 12

or
L = 65 µm
Observe that the extra delay is solely due to the linear factor in the equation, and more specifically due to the extra capacitance introduced by the wire
...
This is
due to the high resistance of the (minimum-size) driver transistors
...
Analyze, for instance, the same problem with the
driver transistors 100 times wider, as is typical in high-speed, large fan-out drivers
...
5 Power, Energy, and Energy-Delay
So far, we have seen that the static CMOS inverter with its almost ideal VTC— symmetrical shape, full logic swing, and high noise margins— offers a superior robustness, which
simplifies the design process considerably and opens the door for design automation
...
It is this combination of robustness and low
static power that has made static CMOS the technology of choice of most contemporary
digital designs
...


chapter5
...
5
...
Part of this
energy is dissipated in the PMOS device, while the remainder is stored on the load capacitor
...

A precise measure for this energy consumpVDD
tion can be derived
...
We assume, initially, that the input
iVDD
waveform has zero rise and fall times, or, in other
words, that the NMOS and PMOS devices are never
vout
on simultaneously
...
23 is valid
...
23 Equivalent circuit
the instantaneous power over the period of interest
...

The corresponding waveforms of vout(t) and iVDD(t)
are pictured in Figure 5
...









0

0

E VDD = i VDD ( t )V DD dt = V DD








0

0

E C = i VDD ( t )v out dt =

dv out
C L ----------- t = C L V DD
d
dt

dv out
v
C L ----------- out dt = C L
dt

V DD

∫ dv

out

2
= C L V DD

(5
...
34)

0

These results can also be derived by observing that during the low-to-high transition, CL is loaded with a charge CLVDD
...
The energy stored on the capacitor equalsCLVDD2/2
...
The
C
other half has been dissipated by the PMOS transistor
...
Once again, there is no dependence on the size of the device
...
In order to compute the power consumption, we have to take
3
Observe that this model is a simplification of the actual circuit
...
The latter experience a charge-discharge cycle that is out of phase with the capacitances to GND,
i
...
they get charged when Vout goes low and discharged when Vout rises
...


chapter5
...
5

iVDD

t

Charge

Discharge

t

Figure 5
...


into account how often the device is switched
...
35)

f0→1 represents the frequency of energy-consuming transitions, this is 0 1 transitions

for static CMOS
...
At
f
the same time, the total capacitance on the chip CL) increases as more and more gates are
(
placed on a single die
...
25µm CMOS chip with a clock rate of
500 Mhz and an average load capacitance of 15 fF/gate, assuming a fanout of 4
...
5 V supply then equals approximately 50
µW
...
In reality, not all gates in the complete IC switch at the full rate of
500 Mhz
...

Example 5
...
4 is now easily computed
...
2, the value of the load capacaitance was determined to equal 6 fF
...
5 V, the amount of energy needed to charge and discharge that capacitance equals
2

E dyn = C L V DD = 37
...
For a tp of 32
...
5), we find that the dynamic power dissipation of
the circuit is
P dyn = E dyn ⁄ ( 2t p ) = 580 µW
Of course, an inverter in an actual circuit is rarely switched at this maximum rate, and
even if done so, the output does not swing from rail-to-rail
...
For a rate of 4 GHz T = 250 psec), the dissipation reduces to 150µW
...


chapter5
...
While the switching activity is easily computed for an
inverter, it turns out to be far more complex in the case of higher-order gates and circuits
...
Other factors influencing the
activity are the overall network topology and the function to be implemented
...
36)

where f now presents the maximum possible event rate of the inputs (which is often the
clock rate) and P0→1 the probability that a clock event resultsin a 0 → 1 (or power-consuming) event at the output of the gate
...
For our example, an
activity factor of 10% ( 0→1 = 0
...

P
Example 5
...

Power consuming transitions Output signal
occur 2 out of 8 times, which is
equaivalent to a transition proba- Figure 5
...
25 (or 25%)
...
This is one of the reasons that lower supply
voltages are becoming more and more attractive
...
For instance, reducingV DD from 2
...
25 V for our example drops the power dissipation from 5 W to 1
...
This assumes that the same clock rate can be sustained
...
17
demonstrates that this assumption is not that unrealistic as long as the supply voltage is substantially higher than the threshold voltage
...

When a lower bound on the supply voltage is set by external constraints (as often happens in real-world designs), or when the performance degradation due to lowering the supply
voltage is intolerable, the only means of reducing the dissipation is by lowering the effective
capacitance
...

A reduction in the switching activity can only be accomplished at the logic and architectural abstraction levels, and will be discussed in more detail in later Chapters
...
fm Page 177 Monday, September 6, 1999 11:41 AM

Section 5
...
As most of the capacitance in a combinational logic circuit is due to transistor capacitances (gate and diffusion), it makes sense to keep those contributions to a
minimum when designing for low power
...
This definitely affects the performance of the circuit, but
the effect can be offset by using logic or architectural speed-up techniques
...
This is contrary to common design practices used in cell libraries, where transistors are generally made large to accommodate a range
of loading and performance requirements
...
Assume we have to minimize the energy dissipation of a circuit with a specified lower-bound on the performance
...
Yet, the latter causes the capacitance to
increase
...

To analyze the transistor-sizing for minimum energy problem, let us analyze the simple
case of a static CMOS inverter driving a load capacitance consisting of an intrinsic ( ) and
Cint
an extrinsic component (Cext) (Figure 5
...
While the former represents the diffusion capacitances, the latter stands for wiring capacitance and fan-out
...
The factor S stands for the inverter sizing factor
,
where S is equal to 1 for an inverter constructed of minimum-size devices
...
Figure
S
5
...
The speed of
α
all implementations is kept constant by appropriately adjusting the supply voltage: larger values of S normally mean lower values of the supply voltage
...
5

2

2
...
5
Scaling Factor S

α=5

4

4
...
26 Normalized energy of a MOS inverter with load capacitance L, as a function of the inverter
C
size S and the ratio between the extrinsic and intrinsic capacitance (= Cext/Cint)
...
5 V)
...
fm Page 178 Monday, September 6, 1999 11:41 AM

178

THE CMOS INVERTER

Chapter 5

When α = 0 (or the load capacitance is zero), the lowest energy consumption is obtained
when using minimum-size devices
...
This result should come as no surprise: transistor sizing to increase performance— and reduce the energy by lowering the supply voltage— only
makes sense as long as performance is dominated by the extrinsic capacitance
...
For example, a sizing
factor S of 3
...
The energy-reduction— with a factor
of 4 with respect to the circuit instance with minimum-size devices— requires that the supply
voltage be reduced to 1
...


Example 5
...
26 as
a function of S and α
...

An expression for the propagation delay of the gate was already derived in Eq
...
24),
and is repeated here for convenience
...
69 ( S + α )C int  --------------------------- =  1 + --- t p0



2S
S
tpo stands for the intrinsic delay of the gate at the reference voltage DD
...
(5
...


V DD
1
=
t p0 ∼ ------------------------- -------------------------------V DD – V TE
1 – V TE ⁄ V DD

(5
...

Keeping the propagation delay of the scaled inverter constant with respect to the reference case means lowering the supply voltage:
V′
S(1 + α)
DD
----------- = ------------------------------------------------------------------------V TE
α ( S – 1 ) + ( S + α ) ( V TE ⁄ V DD )
where V′DD and VDD are the supply voltages of the scaled and reference inverters, respectively
...
5 V and VTE = 0
...
26
...
The reader should further be aware that
the presented model is somewhat optimistic, as it ignores the extra energy dissipation related
to the increased gate capacitance of the driving transistors
...
fm Page 179 Monday, September 6, 1999 11:41 AM

Section 5
...
The finite slope of the input signal causes a direct current path between DD
V
and GND for a short period of time during switching, while the NMOS and the PMOS
transistors are conducting simultaneously
...
27
...
38)

as well as the average power consumption
2

P dp = t sc VDD I peak f = C sc VDD f

(5
...
27

Short-circuit currents during transients
...
tsc represents the time both devices are conducting
...
(5
...

V DD – 2V T
V DD – 2V T t r ( f )
t sc = ------------------------- ≈ ------------------------- -------ts
×
V DD
V DD
0
...
40)

Ipeak is determined by the saturation current of the devices and is hence directly proportional to the sizes of the transistors
...
This relationship is best illustrated by the following simple analyis: Consider a static CMOS inverter with a 0 1 transition at the input
...
28a)
...
As the source-drain
voltage of the PMOS device is approximately 0 during that period, the device shuts off
without ever delivering any current
...


chapter5
...
28 Impact of load capacitance on short-circuit current
...
28b)
...
This clearly
represents the worst-case condition
...
29, which plots the short-circuit current through the NMOS transistor during a
low-to-high transition as a function of the load capacitance
...
5

x 10

CL = 20 fF

2

Isc (A)

1
...
5

Figure 5
...


0

-0
...
On the other hand,
making the output rise/fall time too large slows down the circuit and can cause short-circuit currents in the fan-out gates
...

Design Techniques
A more practical rule, which optimizes the power consumption in a global way, can be formulated (Veendrick84]):

chapter5
...
5

Power, Energy, and Energy-Delay

181

The power dissipation due to short-circuit currents is minimized by matching the rise/fall
times of the input and output signals
...

Making the input and output rise times of a gate identical is not the optimum solution for
that particular gate on its own, but keeps the overall short-circuit current within bounds
...
30, which plots the short-circuit energy dissipation of an inverter (normal8

W/L|P = 1
...
25 µm

7

W/L|N = 0
...
25 µm
CL = 30 fF

VDD = 3
...
5 V

2
1

VDD = 1
...
30 Power dissipation of a static CMOS
inverter as a function of the ratio between input
and output rise/fall times
...
At low values of the slope ratio, inputoutput coupling leads to some extra dissipation
...
When the load capacitance is too small for a given inverter size
(r > 2…3 for VDD = 5 V), the power is dominated by the short-circuit current
...
When the rise/fall times of inputs and outputs are equalized, most power dissipation is associated with the dynamic power and only a minor fraction (< 10%) is devoted to
short-circuit currents
...
(5
...
In the extreme case, when VDD < VTn + |VTp|,
short-circuit dissipation is completely eliminated, because both devices are never on
simultaneously
...

At a supply voltage of 2
...
5 V, an input/output slope ratio of 2 is
needed to cause a 10% degradation in dissipation
...
(5
...
The value of this short-circuit capacitance is a function ofVDD, the transistor
sizes, and the input-output slope ratio
...
fm Page 182 Monday, September 6, 1999 11:41 AM

182

THE CMOS INVERTER

5
...
2

Chapter 5

Static Consumption

The static (or steady-state) power dissipation of a circuit is is expressed by Eq
...
41),
where Istat is the current that flows between the supply rails in the absence of switching
activity
P stat = I stat V DD

(5
...
There is, unfortunately, a leakage current flowing through the reverse-biased diode junctions of the transistors, located between the source or drain and the substrate as shown in Figure 5
...
This
contribution is, in general, very small and can be ignored
...
For a die with 1 million gates, each with a drain area of 0
...
5 V, the worst-case power consumption due to
diode leakage equals 0
...

VDD
VDD

Vout = V DD
Drain Leakage
Current
Subthreshold current

Figure 5
...


However, be aware that the junction leakage currents are caused by thermally generated carriers
...
At 85 (a common junction temperature limit for commercial
°C
hardware), the leakage currents increase by a factor of 60 over their room-temperature values
...
As the temperature is a strong function of the dissipated heat and its removal mechanisms, this can only be accomplished by limiting the power dissipation of the circuit
and/or by using chip packages that support efficient heat removal
...

As discussed in Chapter 3, an MOS transistor can experience a drain-source current, even
when VGS is smaller than the threshold voltage (Figure 5
...
The closer the threshold
voltage is to zero volts, the larger the leakage current at GS = 0 V and the larger the static
V
power consumption
...
Standard processes feature VT values that are never smaller than
0
...
6V and that in some cases are even substantially higher (~ 0
...

This approach is being challenged by the reduction in supply voltages that typically
goes with deep-submicron technology scaling as became apparent in Figure 3
...
We con-

chapter5
...
5

Power, Energy, and Energy-Delay

183

cluded earlier (Figure 5
...
One approach to address this performance issue is to scale the device
thresholds down as well
...
17 to the left, which means that
the performance penalty for lowering the supply voltage is reduced
...
32
...
The continued scaling
of the supply voltage predicted for the next generations of CMOS technologies will however force the threshold voltages ever downwards, and will make subthreshold conduction
a dominant source of power dissipation
...
2

VT = 0
...
32 Decreasing the threshold
increases the subthreshold current atVGS =
0
...
An example of the
latter is the SOI (Silicon-on-Insulator) technology whose MOS transistors have slope-factors that are close to the ideal 60 mV/decade
...
13 Impact of threshold reduction on performance and static power dissipation
Consider a minimum size NMOS transistor in the 0
...
In Chapter 3,
we derived that the slope factor S for this device equals 90 mV/decade
...
5V equals 10 A (Figure 3
...
Reducing the threshold with 200 mV to 0
...
5 V, this translates into
6
a static power dissipation of 10 ×170×10-11×1
...
6 mW
...
5 W! At that supply voltage,
the threshold reductions correspond to a performance improvement of 25% and 40%, respectively
...
The idea that the leakage current in a static CMOS circuit has to be zero is a preconception
...
As long as the noise margins are within range, this is not a compelling issue
...
fm Page 184 Monday, September 6, 1999 11:41 AM

184

THE CMOS INVERTER

Chapter 5

tion
...
For a
0
...
7 V VT; and 0
...
1 V VT
...
The optimal operation point
depends upon the activity of the circuit
...
Power-down (also calledstandby) can be accomplished by disconnecting the unit from the supply rails, or by lowering the supply voltage
...
5
...
42)

In typical CMOS circuits, the capacitive dissipation is by far the dominant factor
...
Leakage is ignorable at present, but this might change in the not too distant future
...

PDP = Pav t p

(5
...

Assuming that the gate is switched at its maximum possible rate of max = 1/(2tp), and
f
ignoring the contributions of the static and direct-path currents to the power consumption,
we find
2

C L V DD
2
PDP = CL V DD f max t p = ---------------2

(5
...
Remember that earlier we had definedEav as the average
energy per switching cycle (or per energy-consuming event)
...

Energy-Delay Product
The validity of the PDP as a quality metric for a process technology or gate topology is
questionable
...
fm Page 185 Monday, September 6, 1999 11:41 AM

Section 5
...
Yet for a given structure, this number can be made arbitrarily low by
reducing the supply voltage
...
This comes at the
major expense in performance, at discussed earlier
...
The energy-delay product (EDP) does exactly
that
...
45)

It is worth analyzing the voltage dependence of the EDP
...
An optimum operation point should hence exist
...
(5
...

αC L V DD
t p ≈ -----------------------V DD – V Te

(5
...
Combining Eq
...
45) and Eq
...
46), 4
2

3

αC L V DD
EDP = --------------------------------2 ( V DD – V TE )

(5
...
(5
...

3
V DDopt = -- V TE
2

(5
...
For sub-micron technologies with
thresholds in the range of 0
...

Example 5
...
25 µm CMOS inverter

From the technology parameters for our generic CMOS process presented in Chapter 3, the
value of VTE can be derived
...
43 V, VDsatn = 0
...
74 V
...
4 V, VDsatp = -1 V, VTEp = -0
...

VTE ≈ (VTEn+|VTEp|)/2 = 0
...
8 V = 1
...
The simulated graphs of Figure 5
...
The optimum supply volt4
This equation is only accurate as long as the devices remain in velocity saturation, which is probably
not the case for the lower supply voltages
...


chapter5
...
1 V
...

15

Energy-Delay (norm)

Energy-Delay
10

Energy

Delay
5

0
0
...
5
V (V)

2

Figure 5
...
25 µm CMOS technology
...
5

DD

WARNING: While the above example demonstrates that there exists a supply voltage
that minimizes the energy-delay product of a gate, this voltage does not necessarily represent the optimum voltage for a given design problem
...
Similarly, a lower-energy design is possible by operating by circuit at a lower voltage and by
obtaining the overall system performance through the use of architectural techniques such
as pipelining or concurrency
...
5
...

T





0

P av

T

0

V DD
1
= -- p ( t )dt = --------- i DD ( t )dt
T
T

(5
...

Some implementations of SPICE provide built-in functions to measure the average value
of a circuit signal
...
MEASURE TRAN I(VDD) AVG command
computes the area under a computed transient response I(VDD)) and divides it by the
(
period of interest
...
(5
...
Other implementations of SPICE are, unfortunately, not as extensive
...
A small circuit can
easily be conceived that acts as an integrator and whose output signal is nothing but the
average power
...
fm Page 187 Monday, September 6, 1999 11:41 AM

Section 5
...
34
...
The resistance R is only provided for DC-convergence reasons and should be chosen as high as possible to minimize leakage
...
The operation
of the circuit is summarized in Eq
...
50) under the assumption that the initial voltage on
the capacitor C is zero
...
50)
T



k
P av = --- i DD dt
C
0

Equating Eq
...
49) and Eq
...
50) yields the necessary conditions for the equivalent
circuit parameters: k/C = VDD/T
...

VDD
+
iDD


Pav
Circuit
under test

C
k iDD

R

Figure 5
...


Example 5
...
4 is analyzed using the above
technique for a toggle period of 250 psec T = 250 psec, k = 1, VDD = 2
...
The resulting power consumption is plotted in Figure 5
...
3 µW
...
MEAS AVG command yields a value of
160
...
These numberes are equivalent to an energy of 39 fJ (which is close to the 37
...
10)
...
This is due to the
injection of current into the supply, when the output briefly overshoots DD as a result of the
V
capacitive coupling between input and output (as is apparent from in the transient response of
Figure 5
...


chapter5
...
8

x 10

Average Power
(over one cycle)

1
...
4

power (W)

1
...
8
0
...
4
0
...
5

1

1
...
5
-10

Figure 5
...


x 10

5
...
5, we have explored the impact of the scaling of technology on the some of
the important design parameters such as area, delay, and power
...
8)
...
3 Scaling scenarios for short-channel devices S and U represent the technology and voltage
(
scaling parameters, respectively)
...
fm Page 189 Monday, September 6, 1999 11:41 AM

Section 5
...
From , we can
derive that the gate delay indeed decreases
exponentially at a rate of 13%/year, or halving
every five years
...
3, since S averages
approximately 1
...
39
...
36 Scaling of the gate delay (from
is projected to be a few tens of picoseconds by
[Dally98])
...

Reducing power dissipation has only been a second-order priority until recently
...
An interesting chart is shown in Figure 5
...
Although the variation is
large— even for a fixed technology— it shows the power density to increase approximately
with S2
...
3
...
Even under
these circumstances, power dissipation-per-chip will continue to increase due to the everlarger die sizes
...
37 Evolution of power-density in
micro- and DSP processors, as a function of
the scaling factor S ([Sakurai])
...


S
The presented scaling model has one fatal flaw however: the performance and
power predictions produce purely “intrinsic” numbers that take only device parameters
into account
...

Similarly, charging and discharging the wire capacitances may dominate the energy bud-

chapter5
...
To get a crisper perspective, one has to construct a combined model that considers
device and wire scaling models simultaneously
...
4
...
We furthermore assume that the resistance of the driver dominates the
wire resistance, which is definitely the case for short to medium-long wires
...
4 Scaling scenarios for wire capacitance
...
εc represents the impact of fringing
and interwire capacitances
...
This impact is limited to an increase withc for short
ε
wires (S = SL), but it becomes increasingly more outspoken for medium-range and long
wires (SL < S)
...
38
...
38 Evolution of wire delay / gate delay ratio
with respect to technology (from [Fisher98])
...
The doomday scenario that interconnect may cause CMOS performance to saturate in the very near future hence may be exagerated
...
g
...


chapter5
...
7

Summary

191

5
...
The
key characteristics of the gate are summarized:
• The static CMOS inverter combines a pull-up PMOS section with a pull-down
NMOS device
...

• The gate has an almost ideal voltage-transfer characteristic
...
The noise margins
of a symmetrical inverter (where PMOS and NMOS transistor have equal currentdriving strength) approach VDD/2
...

• Its propagation delay is dominated by the time it takes to charge or discharge the
load capacitor CL
...
69 C L  --------------------------

2
Keeping the load capacitance small is the most effective means of implementing
high-performance circuits
...

• The power dissipation is dominated by the dynamic power consumed in charging
and discharging the load capacitor
...
The dissipation is
proportional to the activity in the network The dissipation due to the direct-path

...
The static dissipation can usually be ignored but might become a major factor in the future as a result of subthreshold currents
...
The impact is even more striking if the supply
voltage is scaled simultaneously
...


5
...
Virtually every book on digital design devotes a substantial number of pages to the
analysis of the basic inverter gate
...
Some references of particular interest that were explicitly quoted in this chapter are
given below
...
fm Page 192 Monday, September 6, 1999 11:41 AM

192

THE CMOS INVERTER

Chapter 5

REFERENCES
[Baccarani84] G
...
Wordeman, and R
...
Electron Devices, ED-31(4):
p
...

[Brews89] J
...
, “The Submicrometer Silicon MOSFET,” in [Watts89]
...
, Computer Aided Design of Digital Integrated Circuits, Lecture Notes,
Katholieke Universiteit Leuven, Belgium
...
Dennard et al
...
256–258, 1974
...
Embabi, A
...
Elmasry,Digital BiCMOS Integrated Circuit Design,
Kluwer Academic Publishers, 1993
...
Hodges and H
...

[Jouppi93] N
...
, “A 300 MHz 115W 32b Bipolar ECL Microprocessor with On-Chip
Caches,” Proc
...
, pp
...

[Kakumu90] M
...
Kinugawa, “Power-Supply Voltage Impact on Circuit Performance for Half and Lower Submicrometer CMOS LSI,”IEEE Journal of Solid-State Circuits,
vol
...
8, pp
...

[Lohstroh81] J
...

69, pp
...

[Masaki92] A
...
18–24, November 1992
...
Schutz, “A 3
...
6 mm BiCMOS Superscaler Microprocessor,” ISSCC Digest of
Technical Papers, pp
...

[Sedra87] Sedra and Smith, MicroElectronic Circuits, Holt, Rinehart and Winston, 1987
...
, CMOS Digital Circuit Technology, Prentice Hall, 1988
...
Tang, “Scaling the Silicon Bipolar Transistor,” in [Watts89]
...
Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on
the Design of Buffer Circuits,” IEEE Journal of Solid-State Circuits, vol
...
4,
pp
...

[Watts89] Watts R
...
, SubMicron Integrated Circuits, Wiley, 1989
...
9 Exercises and Design Problems
For all problems, use the device parameters provided in Chapter (as well as the inside back cover),
3
unless otherwise mentioned
...
[M, SPICE, 3
...
2] The layout of a static CMOS inverter is given in Figure 5
...
(1 = 0
...

a
...

b
...

V
c
...
fm Page 176 Friday, January 18, 2002 9:01 AM

CHAPTER

5

THE CMOS INVERTER
Quantification of integrity, performance, and energy metrics of an inverter
Optimization of an inverter design

5
...
4
...
2

The Static CMOS Inverter — An Intuitive
Perspective

Propagation Delay: First-Order
Analysis

5
...
3

Propagation Delay from a Design
Perspective

5
...
5

Power, Energy, and Energy-Delay

5
...
1

Switching Threshold

5
...
1

Dynamic Power Consumption

5
...
2

Noise Margins

5
...
2

Static Consumption

Robustness Revisited

5
...
3

Putting It All Together

5
...
4

Analyzing Power Consumption Using
SPICE

5
...
3
5
...
4
...
6

Perspective: Technology Scaling and its
Impact on the Inverter Metrics

chapter5
...
1

5
...
Once its operation and properties are
clearly understood, designing more intricate structures such as NAND gates, adders, multipliers, and microprocessors is greatly simplified
...
The analysis of inverters can be extended to explain the behavior of more complex gates such as NAND, NOR, or XOR, which in turn form the building blocks for modules such as multipliers and processors
...
This is certainly the most popular
at present, and therefore deserves our special attention
...
While each of these parameters can be easily quantified for a given technology,
we also discuss how they are affected by scaling of the technology
...


5
...
1 shows the circuit diagram of a static CMOS inverter
...
25): the transistor is nothing more than a switch with an infinite offresistance (for |VGS| < |VT|), and a finite on-resistance (for |VGS| > |VT|)
...
1 Static CMOS inverter
...


chapter5
...
When Vin is high and equal to VDD, the NMOS
transistor is on, while the PMOS is off
...
2a
...
On the other hand, when the input voltage is low (0 V), NMOS and PMOS transistors
are off and on, respectively
...
2b shows that a path exists
between VDD and Vout, yielding a high output voltage
...

VDD

VDD

Rp

Vout

Vout
Rn

Vin = VDD
(a) Model for high input

Vin = 0
(b) Model for low input

Figure 5
...


Switch models of CMOS

A number of other important properties of static CMOS can be derived from this switchlevel view:
• The high and low output levels equal VDD and GND, respectively; in other words,
the voltage swing is equal to the supply voltage
...

• The logic levels are not dependent upon the relative device sizes, so that the transistors can be minimum size
...
This is in
contrast with ratioed logic, where logic levels are determined by the relative dimensions of the composing transistors
...
A well-designed CMOS inverter, therefore, has a low output impedance, which makes it less sensitive to noise and disturbances
...

• The input resistance of the CMOS inverter is extremely high, as the gate of an MOS
transistor is a virtually perfect insulator and draws no dc input current
...
A single inverter can theoretically drive an infinite number of
gates (or have an infinite fan-out) and still be functionally operational; however,
increasing the fan-out also increases the propagation delay, as will become clear
below
...


chapter5
...
2

The Static CMOS Inverter — An Intuitive Perspective

179

• No direct path exists between the supply and ground rails under steady-state operating conditions (this is, when the input and outputs remain constant)
...

SIDELINE: The above observation, while seemingly obvious, is of crucial importance,
and is one of the primary reasons CMOS is the digital technology of choice at present
...
All early microprocessors, such
as the Intel 4004, were implemented in a pure NMOS technology
...
The resulting static power
consumption puts a firm upper bound on the number of gates that can be integrated on a
single die; hence the forced move to CMOS in the 1980s, when scaling of the technology
allowed for higher integration densities
...
Such a graphical construction is traditionally called a load-line plot
...
We have selected the input voltage Vin, the output voltage Vout and the NMOS
drain current IDN as the variables of choice
...
1)

V GSn = V in ; V GSp = V in – V DD
V DSn = V out ; V DSp = V out – V DD

The load-line curves of the PMOS device are obtained by a mirroring around the xaxis and a horizontal shift over VDD
...
3, where the
subsequent steps to adjust the original PMOS I-V curves to the common coordinate set Vin,
Vout and IDn are illustrated
...
5

Vin = 1
...
5

Vin = VDD + VGSp
IDn = –IDp

Vout = VDD + VDSp

Figure 5
...
5 V)
...
fm Page 180 Friday, January 18, 2002 9:01 AM

180

THE CMOS INVERTER

Chapter 5

IDn
Vin = 0

Vin = 0
...
5

Vin = 2

Vin = 1

NMOS

Vin = 1
...
5

Vin = 1
Vin = 1
...
5

Vin = 2
...
4 Load curves for NMOS and PMOS transistors of the static CMOS inverter (VDD = 2
...
The dots
represent the dc operation points for various input voltages
...
4
...
Graphically, this
means that the dc points must be located at the intersection of corresponding load lines
...
5, 1, 1
...
5 V) are marked on the graph
...

The VTC of the inverter hence exhibits a very narrow transition zone
...
In that operation region, a small change in the input voltage
results in a large output variation
...
5
...
5

Vout

2

NMOS sat
PMOS res

1

1
...
5

NMOS res
PMOS sat

0
...
5

2

NMOS res
PMOS off
2
...
5 VTC of static CMOS inverter,
derived from Figure 5
...
5 V)
...


Before going into the analytical details of the operation of the CMOS inverter, a
qualitative analysis of the transient behavior of the gate is appropriate as well
...
fm Page 181 Friday, January 18, 2002 9:01 AM

Section 5
...
6 Switch model of
dynamic behavior of static CMOS
inverter
...
Assuming
temporarily that the transistors switch instantaneously, we can get an approximate idea of
the transient response by using the simplified switch model again (Figure 5
...
Let us consider the low-to-high transition first (Figure 5
...
The gate response time is simply determined by the time it takes to charge the capacitor CL through the resistor Rp
...
5, we learned that the propagation delay of such a network is proportional to the its time
constant RpCL
...
The latter is achieved by
increasing the W/L ratio of the device
...
6b), which is dominated by the RnCL time-constant
...
This complicates the exact determination of the propagation delay
...
4
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior
In the qualitative discussion above, the overall shape of the voltage-transfer characteristic
of the static CMOS inverter was derived, as were the values of VOH and VOL (VDD and
GND, respectively)
...

5
...
1

Switching Threshold

The switching threshold, VM, is defined as the point where Vin = Vout
...
5)
...
An analytical expression for VM is obtained by equating the currents through the tran-

chapter5
...
We solve the case where the supply voltage is high so that the devices can be
assumed to be velocity-saturated (or VDSAT < VM - VT)
...

V DSATp
V DSATn
k n V DSATn  V M – V Tn – ----------------  + k p V DSATp  V M – V DD – V Tp – ----------------  = 0



2 
2

(5
...
3)
k n V DSATn υ satn W n
1+r

assuming identical oxide thicknesses for PMOS and NMOS transistors
...
(5
...
4)

Eq
...
4) states that the switching threshold is set by the ratio r, which compares the relative driving strengths of the PMOS and NMOS transistors
...
This
requires r to be approximately 1, which is equivalent to sizing the PMOS device so that
(W/L)p = (W/L)n × (VDSATnk′n )/(VDSATnk′p )
...
Increasing the strength of the NMOS, on
the other hand, moves the switching threshold closer to GND
...
(5
...
When using this
expression, please make sure that the assumption that both devices are velocity-saturated
still holds for the chosen operation point
...
5)

Problem 5
...


The above expressions were derived under the assumption that the transistors are velocitysaturated
...
Under these circumstances,
Eq
...
6) holds for VM
...

V Tn + r ( V DD + V Tp )
V M = ----------------------------------------------- with r =
1+r

–k p
------kn

(5
...
fm Page 183 Friday, January 18, 2002 9:01 AM

Section 5
...
The required ratio is given by
Eq
...
5)
...
1 Switching threshold of CMOS inverter
We derive the sizes of PMOS and NMOS transistors such that the switching threshold of a
CMOS inverter, implemented in our generic 0
...
We use the process parameters presented in Example 3
...
5 V
...
5
...
(5
...
63 × ( 1
...
43 – 0
...
5
------------------------- --------- -----------------------------------------------------–
1
...
25 – 0
...
0 ⁄ 2 )
( W ⁄ L )n
30 × 10

Figure 5
...
The simulated PMOS/NMOS ratio of 3
...
25 V
switching threshold confirms the value predicted by Eq
...
5)
...
7 produces some interesting observations:
1
...
This means that small
variations of the ratio (e
...
, making it 3 or 2
...
It is therefore an accepted practice in industrial designs to set the
width of the PMOS transistor to values smaller than those required for exact symmetry
...
5, and 2 yields switching
thresholds of 1
...
18 V, and 1
...

1
...
7
1
...
5

1
...
4

1
...
1
1
0
...
8
10

0

10

W /W
p

n

1

Figure 5
...
25 µm
CMOS, VDD = 2
...
fm Page 184 Friday, January 18, 2002 9:01 AM

184

THE CMOS INVERTER

Vin

Chapter 5

Vmb
Vma

Vin

Vout

t

Vout

Vout

t
b) Response of inverter with
(a) Response of standard
modified threshold
inverter
Figure 5
...


t

2
...

Increasing the width of the PMOS or the NMOS moves VM towards VDD or GND
respectively
...
This is demonstrated by the example of
Figure 5
...
The incoming signal Vin has a very noisy zero value
...
8a)
...
8b)
...

Changing the switching threshold by a considerable amount is however not easy,
especially when the ratio of supply voltage to transistor threshold is relatively small
(2
...
4 = 6 for our particular example)
...
5 V requires a
transistor ratio of 11, and further increases are prohibitively expensive
...
7 is plotted in a semi-log format
...
3
...
In
d V in

the terminology of the analog circuit designer, these are the points where the gain g of the
amplifier, formed by the inverter, is equal to −1
...

A simpler approach is to use a piece wise linear approximation for the VTC, as
shown in Figure 5
...
The transition region is approximated by a straight line, the gain of
which equals the gain g at the switching threshold VM
...
The error introduced is small and well

chapter5
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior

185

Vout
VOH

VM

Figure 5
...


Vin
VOL

VIL

VIH

within the range of what is required for an initial design
...

( V OH – V OL )
– V DD
V IH – V IL = – ------------------------------ = -----------g
g
VM
V IH = V M – -----g

V DD – V M
V IL = V M + ----------------------g

NM H = V DD – V IH

(5
...
In the extreme case of an infinite gain, the noise margins simplify to VOH VM and VM - VOL for NMH and NML, respectively, and span the complete voltage swing
...
We assume
once again that both PMOS and NMOS are velocity-saturated
...
4 that the gain is a strong function of the slopes of the currents in the saturation region
...
The gain can now be derived by differentiating the current
equation (5
...

V DSATn
k n V DSATn  V in – V Tn – ----------------  ( 1 + λ n V out ) +

2 

(5
...
9)
d V in
λ n k n V DSATn ( V in – V Tn – V DSATn ⁄ 2 ) + λ p k p V DSATp ( Vin – V DD – V Tp – V DSATp ⁄ 2 )

Ignoring some second-order terms, and setting Vin = VM results in the gain expression,

chapter5
...
10)

1+r
≈ ------------------------------------------------------------------------------( V M – V Tn – V DSATn ⁄ 2 ) ( λ n – λ p )
with ID(VM) the current flowing through the inverter for Vin = VM
...
It
can only in a minor way be influenced by the designer through the choice of supply and
switching threshold voltages
...
2 Voltage transfer characteristic and noise margins of CMOS Inverter
Assume an inverter in the generic 0
...
4 and with the NMOS transistor minimum size (W = 0
...
25 µm, W/L =
1
...
We first compute the gain at VM (= 1
...
5 × 115 × 10

–6

× 0
...
25 – 0
...
63 ⁄ 2 ) × ( 1 + 0
...
25 ) = 59 × 10
–6

–6

A

–6

1
g = – ---------------------- 1
...
63 + 1
...
4 × 30 × 10 × 1
...
5 (Eq
...
10)
-----------------------------------------------------------------------------------------------------------------------------–6
0
...
1
59 × 10
This yields the following values for VIL, VIH, NML, NMH:
VIL = 1
...
3 V, NML = NMH = 1
...

Figure 5
...
A close
to ideal characteristic is obtained
...
03 V and 1
...
03 V and 1
...
These values are lower than
those predicted for two reasons:
• Eq
...
10) overestimates the gain
...
10b, the maximum gain (at
VM) equals only 17
...
17 V, and 1
...

• The most important deviation is due to the piecewise linear approximation of the
VTC, which is optimistic with respect to the actual noise margins
...

To conclude this example, we also extracted from simulations the output resistance of
the inverter in the low- and high-output states
...
4 kΩ and 3
...
The output resistance is a good measure of the sensitivity of the gate
in respect to noise induced at the output, and is preferably as low as possible
...
This region
is very narrow however, as is apparent in the graph of Figure 5
...
It also receives poor
marks on other amplifier properties such as supply noise rejection
...
Where the analog designer would bias the amplifier in the middle of the transient
region, so that a maximum linearity is obtained, the digital designer will operate the

chapter5
...
3

Evaluating the Robustness of the CMOS Inverter: The Static Behavior

2
...
5

-10

1
-12
-14

0
...
5

1

1
...
5

-18

0

0
...
10
= 2
...


1

1
...
5

V (V)

(a)

in

(b)
Simulated Voltage Transfer Characteristic (a) and voltage gain (b) of CMOS inverter (0
...


Problem 5
...


5
...
3

Robustness Revisited

Device Variations
While we design a gate for nominal operation conditions and typical device parameters,
we should always be aware that the actual operating temperature might very over a large
range, and that the device parameters after fabrication probably will deviate from the nominal values we used in our design optimization process
...
This already became
apparent in Figure 5
...
To further confirm the assumed robustness of the gate, we have re-simulated the voltage transfer characteristic by replacing the
nominal devices by their worst- or best-case incarnations
...
11: a better-than-expected NMOS combined with an inferior PMOS, and the
opposite scenario
...
fm Page 188 Friday, January 18, 2002 9:01 AM

188

THE CMOS INVERTER

Chapter 5

2
...
5

V

out

Nominal
1

Good NMOS
Bad PMOS

0
...
5

1

1
...
5

Figure 5
...
The “good” device has a smaller oxide
thickness (- 3nm), a smaller length (-25 nm), a higher width
(+30 nm), and a smaller threshold (-60 mV)
...


V (V)
in

gate is by no means affected
...

Scaling the Supply Voltage
In Chapter 3, we observed that continuing technology scaling forces the supply voltages to
reduce at rates similar to the device dimensions
...
The reader probably wonders about the impact of this
trend on the integrity parameters of the CMOS inverter
...
(5
...
Plotting the (normalized) VTC for different supply voltages not only confirms this
conjecture, but even shows that the inverter is well and alive for supply voltages close to
the threshold voltage of the composing transistors (Figure 5
...
At a voltage of 0
...
5 V
...

• The dc-characteristic becomes increasingly sensitive to variations in the device
parameters such as the transistor threshold, once supply voltages and intrinsic voltages become comparable
...
While this typically
helps to reduce the internal noise in the system (such as caused by crosstalk), it
makes the design more sensitive to external noise sources that do not scale
...
fm Page 189 Friday, January 18, 2002 9:01 AM

Section 5
...
5

189

0
...
15

V out (V)

V out (V)

1
...
1

0
...
5

0
0

gain = -1
0
...
5

2

2
...

Figure 5
...
05

0
...
15

0
...


...
25 µm CMOS technology)
...
12b the voltage transfer characteristic of the same inverter for the
even-lower supply voltages of 200 mV, 100 mV, and 50 mV (while keeping the transistor
thresholds at the same level)
...
The sub-threshold
currents are sufficient to switch the gate between low and high levels, and provide enough
gain to produce acceptable VTCs
...

At around 100 mV, we start observing a major deterioration of the gate characteristic
...
The latter turns out to be a fundamental show-stopper
...
It turns out that below this same voltage, thermal noise becomes an issue as
well, potentially resulting in unreliable operation
...
11)

Eq
...
11) presents a true lower bound on supply scaling
...

Problem 5
...
(3
...
Derive an expression for the gain of the inverter under these circumstances

chapter5
...

The resulting expression demonstrates that the minimum voltage is a function of the slope
factor n of the transistor
...
12)

According to this expression, the gain drops to -1 at VDD = 48 mV (for n = 1
...


5
...
This observation suggests
that getting CL as small as possible is crucial to the realization of high-performance
CMOS circuits
...
In addition to this detailed analysis, the section also presents a summary of techniques that a designer might use to optimize the performance of the inverter
...
4
...
To make the analysis tractable, we assume that all capacitances are lumped
together into one single capacitor CL , located between Vout and GND
...

VDD

VDD

M2
Cg4

Cdb2
Vin

Cgd12

Vout

Cdb1
M1

Figure 5
...


chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

191

Figure 5
...
It includes all the
capacitances influencing the transient response of node Vout
...
Accounting
only for capacitances connected to the output node, CL breaks down into the following
components
...
Under these circumstances, the only contributions to Cgd12
are the overlap capacitances of both M1 and M2
...

The lumped capacitor model now requires that this floating gate-drain capacitor be
replaced by a capacitance-to-ground
...
During a low-high or high-low transition, the terminals of the gatedrain capacitor are moving in opposite directions (Figure 5
...
The voltage change over
the floating capacitor is hence twice the actual output voltage swing
...

We use the following equation for the gate-drain capacitors: Cgd = 2 CGD0W (with
CGD0 the overlap capacitance per unit width as used in the SPICE model)
...
57)
...
14 The Miller effect—A capacitor experiencing identical but opposite voltage swings at both
its terminals can be replaced by a capacitor to ground, whose value is two times the original value
...
Such a
capacitor is, unfortunately, quite nonlinear and depends heavily on the applied voltage
...
A multiplication factor Keq is introduced to relate the linearized
capacitor to the value of the junction capacitance under zero-bias conditions
...
In a digital
inverter, the large scale gain between input and output always equals -1
...
fm Page 192 Friday, January 18, 2002 9:01 AM

192

THE CMOS INVERTER

C eq = K eq C j0

Chapter 5

(5
...
An expression
for Keq was derived in Eq
...
11) and is repeated here for convenience
m
–φ0
K eq = --------------------------------------------------- [ ( φ 0 – V high ) 1 – m – ( φ 0 – V low ) 1 – m ]
( V high – V low ) ( 1 – m )

(5
...

Observe that the junction voltage is defined to be negative for reverse-biased junctions
...
3 Keq for a 2
...
13 designed in the generic 0
...
The
relevant capacitance parameters for this process were summarized in Table 3
...

Let us first analyze the NMOS transistor (Cdb1 in Figure 5
...
The propagation delay
is defined by the time between the 50% transitions of the input and the output
...
25 V, as the output voltage swing goes
from rail to rail or equals 2
...
We, therefore, linearize the junction capacitance over the
interval {2
...
25 V} for the high-to-low transition, and {0, 1
...

During the high-to-low transition at the output, Vout initially equals 2
...
Because the
bulk of the NMOS device is connected to GND, this translates into a reverse voltage of 2
...
5 V
...
25 V or Vlow = −1
...

Evaluating Eq
...
14) for the bottom plate and sidewall components of the diffusion capacitance yields
Bottom plate: Keq (m = 0
...
9) = 0
...
44, φ0 = 0
...
61
During the low-to-high transition, Vlow and Vhigh equal 0 V and −1
...
5, φ0 = 0
...
79,
Sidewall: Keqsw (m = 0
...
9) = 0
...
5 V
...
25 V),
Bottom plate: Keq (m = 0
...
9) = 0
...
32, φ0 = 0
...
86
and for the low-to-high transition (Vlow = −1
...
5 V)
Bottom plate: Keq (m = 0
...
9) = 0
...
32, φ0 = 0
...
7

Using this approach, the junction capacitance can be replaced by a linear component
and treated as any other device capacitance
...
The logic delays are not significantly influenced by this
simplification
...
fm Page 193 Friday, January 18, 2002 9:01 AM

Section 5
...
As argued in Chapter 4, this component is growing in importance with the
scaling of the technology
...
Hence,
C fanout = C gate ( NMOS ) + C gate ( PMOS )
= ( C GSOn + C GDOn + W n L n C ox ) + ( C GSOp + C GDOp + W p L p C ox )

(5
...
This
has a relatively minor effect on the accuracy, since we can safely assume that the
connecting gate does not switch before the 50% point is reached, and Vout2, therefore, remains constant in the interval of interest
...
This is not exactly the case as we discovered in
Chapter 3
...
A drop in overall gate capacitance also occurs just before the transistor turns on (Figure 3
...
During the first half of the transient, it may be assumed
that one of the load devices is always in linear mode, while the other transistor
evolves from the off-mode to saturation
...

Example 5
...
25 µm CMOS Inverter
A minimum-size, symmetrical CMOS inverter has been designed in the 0
...
The layout is shown in Figure 5
...
The supply voltage VDD is set to 2
...
From the
layout, we derive the transistor sizes, diffusion areas, and perimeters
...
1
...
The drain area is formed by the metal-diffusion contact, which has an area of 4 × 4 λ2,
and the rectangle between contact and gate, which has an area of 3 × 1 λ2
...
30 µm2 (as λ = 0
...
The perimeter of the drain area is rather
involved and consists of the following components (going counterclockwise): 5 + 4 + 4 + 1 +
1 = 15 λ or PD = 15 × 0
...
875 µm
...
The drain area and perimeter of the
PMOS transistor are derived similarly (the rectangular shape makes the exercise considerably
simpler): AD = 5 × 9 λ2 = 45 λ2, or 0
...
375 µm
...
fm Page 194 Friday, January 18, 2002 9:01 AM

194

THE CMOS INVERTER

Chapter 5

VDD
PMOS
(9λ/2λ)

0
...
15 Layout of two chained, minimum-size inverters using SCMOS Design Rules (see also
Color-plate 6)
...
1

Inverter transistor data
...
375/0
...
125/0
...
3 (19 λ )

1
...
3 (19 λ )

1
...
7 (45 λ2)

2
...
7 (45 λ2)

2
...
The capacitor parameters for our generic process were
summarized in Table 3
...
31 fF/µm; CGDO(PMOS) = 0
...
9 fF/µm2
Side-wall junction capacitance: CJSW(NMOS) = 0
...
22
fF/µm
Gate capacitance: Cox(NMOS) = Cox(PMOS) = 6 fF/µm2
Finally, we should also consider the capacitance contributed by the wire, connecting
the gates and implemented in metal 1 and polysilicon
...
fm Page 195 Friday, January 18, 2002 9:01 AM

Section 5
...
Inspection of the layout helps us
to form a first-order estimate and yields that the metal-1 and polysilicon areas of the wire, that
are not over active diffusion, equal 42 λ2 and 72 λ2, respectively
...
2, we find the wire capacitance — observe that we ignore the
fringing capacitance in this simple exercise
...

Cwire = 42/82 µm2 × 30 aF/µm2 + 72/82 µm2 × 88 aF/µm2 = 0
...
2
...
3 for the computation of the diffusion capacitances
...

Table 5
...


Capacitor

Expression

Value (fF) (H→L)

Value (fF) (L→H)

Cgd1

2 CGD0n Wn

0
...
23

Cgd2

2 CGD0p Wp

0
...
61

Cdb1

Keqn ADn CJ + Keqswn PDn CJSW

0
...
90

Cdb2

1
...
15

(CGD0n+CGSOn) Wn + Cox Wn Ln

0
...
76

Cg4

(CGD0p+CGSOp) Wp + Cox Wp Lp

2
...
28

Cw

From Extraction

0
...
12

CL

5
...
2

Keqp ADp CJ + Keqswp PDp CJSW)

Cg3



6
...
0

Propagation Delay: First-Order Analysis

One way to compute the propagation delay of the inverter is to integrate the capacitor
(dis)charge current
...
(5
...

v2

tp =

CL ( v )

∫ -------------- dv
i(v)

(5
...
An exact computation of this equation is intractable, as both CL(v) and
i(v) are nonlinear functions of v
...
6 to derive a reasonable approximation of the propagation
delay adequate for manual analysis
...
The preceding section derived precisely this value

chapter5
...
An expression for the average on-resistance of the MOS transistor was already derived in Example 3
...

VD D

1
R eq = ---------------VDD ⁄ 2



V DD ⁄ 2

V
3 VDD
7
---------------------------------- dV ≈ -- ------------  1 – -- λV DD

4 I DSAT 
9
I DSAT ( 1 + λV )

(5
...
5
...
Hence,
t pHL = ln(2)R eqn C L = 0
...
18)

Similarly, we can obtain the propagation delay for the low-to-high transition,
t pLH = 0
...
19)

with Reqp the equivalent on-resistance of the PMOS transistor over the interval of interest
...
This has been shown to be approximately the case in
the example of the previous section
...
69C L  -------------------------- 


2
2

(5
...
This condition can be achieved by making the on-resistance of the
NMOS and PMOS approximately equal
...

Example 5
...
25 µm CMOS Inverter
To derive the propagation delays of the CMOS inverter of Figure 5
...

(5
...
(5
...
The load capacitance CL was already computed in Example 5
...
25 µm CMOS process were
derived in Table 3
...
For a supply voltage of 2
...
From the layout, we determine
the (W/L) ratios of the transistors to be 1
...
5 for the PMOS
...

This leads to the following values for the delays:

chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

197

3

Vin

2
...
5

tpHL

1

tpLH

0
...
16 Simulated transient
response of the inverter of Figure
5
...


0
-0
...
5

1

1
...
5

t (sec)

x 10

-10

13kΩ
t pHL = 0
...
1fF = 36 psec
 1
...
69 ×  ------------- × 6
...
5 
and
tp =  36 + 29 = 32
...
15
...
16, and determines the propagation delays to be
39
...
7 for the HL and LH transitions, respectively
...
Notice especially the
overshoots on the simulated output signals
...
These overshoots clearly have a negative impact on the performance of the gate, and explain why the
simulated delays are larger than the estimations
...
This is not necessarily the case
...
The purpose of
the manual analysis is to get a basic insight in the behavior of the circuit and to determine
the dominant parameters
...
Consider the example above a stroke of good luck
...
fm Page 198 Friday, January 18, 2002 9:01 AM

198

THE CMOS INVERTER

Chapter 5

The obvious question a designer asks herself at this point is how she can manipulate
and/or optimize the delay of a gate
...
Combining Eq
...
18) and Eq
...
17), and assuming for the time being that the channel-length modulation factor λ is ignorable, yields the following expression for tpHL (a
similar analysis holds for tpLH)
C L V DD
3 C L V DD
t pHL = 0
...
52 -------------------------------------------------------------------------------------------------------( W ⁄ L ) n k′ n V DSATn ( V DD – V Tn – V DSATn ⁄ 2 )
4 I DSATn

(5
...
Under these conditions, the delay becomes virtually independent of the supply
voltage (Eq
...
22))
...

CL
t pHL ≈ 0
...
22)

This analysis is confirmed in Figure 5
...
It comes as no surprise that this curve is virtually identical in shape to the one of Figure 3
...
While the delay is relative insensitive to supply variations for higher values of VDD, a sharp increase can be observed starting around
5
...
5

t p (normalized)

4
3
...
5
2
1
...
8

1

1
...
4

1
...
8

2

2
...
4

Figure 5
...
5
V)
...
(5
...
Observe that this
equation is only valid when the devices are
velocity-saturated
...


(V)

DD

≈2VT
...

Design Techniques
From the above, we deduce that the propagation delay of a gate can be minimized in the following ways:

chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

199

• Reduce CL
...
Careful layout helps to reduce the diffusion and interconnect capacitances
...

• Increase the W/L ratio of the transistors
...
Proceed however with caution
when applying this approach
...
In fact, once the intrinsic capacitance (i
...
the diffusion capacitance) starts to dominate the extrinsic load formed by wiring and fanout, increasing the
gate size does not longer help in reducing the delay, and only makes the gate larger in
area
...
In addition, wide transistors have a larger gate
capacitance, which increases the fan-out factor of the driving gate and adversely affects
its speed
...
As illustrated in Figure 5
...
This flexibility allows the designer to trade-off energy dissipation for performance, as we will see in a later section
...
Also, reliability concerns (oxide breakdown, hot-electron effects)
enforce firm upper-bounds on the supply voltage in deep sub-micron processes
...
4

Propagation Delay as a Function of (dis)charge Current

So far, we have expressed the propagation delay as a function of the equivalent resistance of
the transistors
...
Derive an expression of the propagation delay using this alternative approach
...
4
...
Most importantly, they lead to a general approach
towards transistor sizing that will prove to be extremely useful
...
This typically requires a ratio of 3 to 3
...
The motivation behind this approach is to create an inverter
with a symmetrical VTC, and to equate the high-to-low and low-to-high propagation
delays
...
If symmetry and reduced noise margins are not of prime concern, it is actually possible to speed up the inverter by reducing the width of the PMOS device!

chapter5
...
When two contradictory effects are present, there must exist
a transistor ratio that optimizes the propagation delay of the inverter
...
Consider
two identical, cascaded CMOS inverters
...
23)

where Cdp1 and Cdn1 are the equivalent drain diffusion capacitances of PMOS and NMOS
transistors of the first inverter, while Cgp2 and Cgn2 are the gate capacitances of the second
gate
...

When the PMOS devices are made β times larger than the NMOS ones (β = (W/L)p /
(W/L)n), all transistor capacitances will scale in approximately the same way, or Cdp1 ≈ β
Cdn1, and Cgp2 ≈ β Cgn2
...
(5
...
24)

An expression for the propagation delay can be derived, based on Eq
...
20)
...
69
t p = --------- ( ( 1 + β ) ( C dn1 + C gn2 ) + C W )  R eqn + --------- 

β 
2
r
= 0
...
25)

r (= Reqp/Reqn) represents the resistance ratio of identically-sized PMOS and NMOS transistors
...
26)

This means that when the wiring capacitance is negligible (Cdn1+Cgn2 >> CW), βopt
equals r , in contrast to the factor r normally used in the noncascaded case
...
The surprising result of this
analysis is that smaller device sizes (and hence smaller design area) yield a faster design at
the expense of symmetry and noise margin
...
6 Sizing of CMOS Inverter Loaded by an Identical Gate
Consider again our standard design example
...
3), we find that a ratio β of 2
...
Eq
...
26) now predicts that the device ratio for an optimal performance
should equal 1
...
These results are verified in Figure 5
...
The graph clearly illustrates how a changing
β trades off between tpLH and tpHL
...
9, which is some-

chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

201

what higher than predicted
...
4
...
5

3
...
5

2

2
...
5

4

4
...
18 Propagation delay of CMOS inverter as a
function of the PMOS/NMOS transistor ratio β
...
The load capacitance of the
inverter can be divided into an intrinsic and an extrinsic component, or CL = Cint + Cext
...
Cext is the extrinsic load capacitance, attributable
to fanout and wiring capacitance
...
69R eq ( C int + C ext )
= 0
...
27)

tp0 = 0
...

The next question is how transistor sizing impacts the performance of the gate
...
(5
...
The intrinsic capacitance Cint consists of the
diffusion and Miller capacitances, both of which are proportional to the width of the transistors
...
The resistance of the gate relates to the reference gate as Req =
Rref/S
...
(5
...
69 ( R ref ⁄ S ) ( SC iref ) ( 1 + C ext ⁄ ( SC iref ) )
C ext
C ext
= 0
...
28)

chapter5
...
When no load is present, an
increase in the drive of the gate is totally offset by the increased capacitance
...

Yet, any sizing factor S that is sufficiently larger than (Cext/Cint) produces similar
results at a substantial gain in silicon area
...
7 Device Sizing for Performance
Let us explore the performance improvement that can be obtained by device sizing in the
design of Example 5
...
We find from Table 5
...
05 (Cint = 3
...
15
fF)
...
05
...

This is confirmed by simulation results, which predict a maximum obtainable perfor3
...
6
3
...
2
3
2
...
6
2
...
2
2

2

4

6

8
S

10

12

14

Figure 5
...
15)
...
9 (tp0 = 19
...
From the graph of Figure 5
...


Sizing A Chain of Inverters
While sizing up an inverter reduces its delay, it also increases its input capacitance
...
Therefore, a more relevant problem is determining the optimum sizing of a gate when embedded in a real environment
...
To determine the input loading effect, the
relationship between the input gate capacitance Cg and the intrinsic output capacitance of
the inverter has to be established
...
Hence, the following relationship holds, independent of gate sizing

chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

203

C int = γC g

(5
...
Rewriting Eq
...
28),
C ext
t p = t p0  1 + --------- = t p0 ( 1 + f ⁄ γ )

γC g 

(5
...
This ratio is called the effective fanout f
...
Figure 5
...
The goal is to minimize the delay
through the inverter chain, with the input capacitance of the first inverter Cg1—typically a
minimally-sized device— and the load capacitance CL fixed
...
20 Chain of N inverters with fixed
input and output capacitance
...
31)

we can derive the total delay of the chain
...
32)

This equation has N-1 unknowns, being Cg,2, Cg,3, …, Cg,N
...
The
result is a set of constraints, Cg,j+1/Cg,j = Cg,j/Cg,j-1
...


(5
...
With Cg,1
and CL given, we can derive the sizing factor,
C L ⁄ C g, 1 =

N

F

(5
...


(5
...


chapter5
...
Observe how the
relationship between tp and F is a very strong function of the number of stages
...
Introducing a second
stage turns it into square root, and so on
...

Choosing the Right Number of Stages in an Inverter Chain
Evaluation of Eq
...
35) reveals the trade-off’s in choosing the number of stages for a
given F (=fN)
...
If the number of stages is
too small, the effective fanout of each stage becomes large, and the second component is
dominant
...

N F ln F
γ + N F – ----------------- = 0
N

or equivalently
f = e

(5
...
Under these simplified conditions, it is found that the optimal number of stages equals N = ln(F), and the effective
fanout of each stage is set to f = 2
...
This optimal buffer design scales consecutive
stages in an exponential fashion, and is hence called an exponential horn [Mead79]
...
(5
...
The results are plotted
in Figure 5
...
For the typical case of γ≈1, the optimum scaler factor turns out to be close
to 3
...
Figure 5
...
Choosing values of the fanout that are higher
than the optimum does not impact the delay that much, and reduces the required number
of buffer stages and the implementation area
...
The use of too many stages (f < fopt), on the other hand, has a substantial negative impact on the delay, and should be avoided
...
8 The Impact of Introducing Buffer Stages

Table 5
...
Observe the impressive
speed-up obtained with cascaded inverters when driving very large capacitive loads
...
22
...


chapter5
...
4

Performance of CMOS Inverter: The Dynamic Behavior

5

205

7

6

4
...
5

4

3

2

3
1

2
...
5

1

1
...
5

0

3

1

1
...
5

3

3
...
5

5

f

(b) Normalized propagation delay (tp/(tpopt)
as a function of the effective fanout f for γ=1
...

Figure 5
...


Table 5
...

F

Unbuffered

Two Stage

Inverter Chain

10

11

8
...
3

100

101

22

16
...
8

10,000

10,001

202

33
...
5 Sizing an Inverter Network

Determine the sizes of the inverters in the circuit of Figure 5
...
You may assume that CL = 64 Cg,1
...
22 Inverter network, in which each
gate has a fanout of 4 gates, distributing a single
input to 16 output signals in a tree-like fashion
...
fm Page 206 Friday, January 18, 2002 9:01 AM

206

THE CMOS INVERTER

Chapter 5

Hints: Determine first the ratio’s between the devices that minimize the delay
...
52Cg,2 = 6
...
Straightforward sizing of the inverter chain, without taking the fanout into account,
would have led to a sizing factor of 4 instead of 2
...


The rise/fall time of the input signal
All the above expressions were derived under the assumption that the input signal to the
inverter abruptly changed from 0 to VDD or vice-versa
...
In reality, the input signal changes gradually
and, temporarily, PMOS and NMOS transistors conduct simultaneously
...
Figure 5
...
It can be observed that tp increases (approximately) linearly with increasing input slope, once ts > tp(ts=0)
...
4

x 10

-11

5
...
8
4
...
4
4
...
8
3
...
23 tp as a function of the
input signal slope (10-90% rise or
fall time) for minimum-size
inverter with fan-out of a single
gate
...
From a design perspective, it is more valuable to relate the impact of the
finite slope on the performance directly to its cause, which is the limited driving capability
of the preceding gate
...
The
strength of this approach is that it realizes that a gate is never designed in isolation, and
that its performance is both affected by the fanout, and the driving strength of the gate(s)
feeding into its inputs
...
fm Page 207 Friday, January 18, 2002 9:01 AM

Section 5
...
37)

Eq
...
37) states that the propagation delay of inverter i equals the sum of the delay of the
i
same gate for a step input (tstep ) (i
...
zero input slope) augmented with a fraction of the
step-input delay of the preceding gate (i-1)
...
25
...

Example 5
...
22
...
(5
...
(5
...


4C g, 3
4C g, 2
t p, 2 = t p0  1 + ------------- + ηt p0  1 + -------------


γC g, 2 
γC g, 1 
An analysis of the overall propagation delay in the style of Problem 5
...
47 (assuming η = 0
...


Design Challenge
It is advantageous to keep the signal rise times smaller than or equal to the gate propagation
delays
...
Keeping the rise and fall times of the signals small and of
approximately equal values is one of the major challenges in high-performance design, and is
often called ‘slope engineering’
...
6 Impact of input slope

Determine if reducing the supply voltage increases or decreases the influence of the input
signal slope on the propagation delay
...


Delay in the Presence of (Long) Interconnect Wires
The interconnect wire has played a minimal role in our analysis so far
...
Earlier delay expressions can be adjusted to accommodate these extra contributions by employing the wire modeling techniques introduced in

chapter5
...
The analysis detailed in Example 4
...
Consider the circuit of Figure 5
...
The driver is represented by a single resistance Rdr,
which is the average between Reqn and Reqp
...

(rw,cw,L)

Vout
Cint

Vout

Cfan

Figure 5
...


The propagation delay of the circuit can be obtained by applying the Ellmore delay
expression
...
69R dr C int + ( 0
...
38R w )C w + 0
...
69R dr ( C int + C fan ) + 0
...
38r w c w L

2

(5
...
38 factor accounts for the fact that the wire represents a distributed delay
...
The delay expressions contains a component that is linear with the wire length, as well a quadratic one
...

Example 5
...
24, and assume the device parameters of Example 5
...
5(13/1
...
5) = 7
...
The wire is implemented in metal1
and has a width of 0
...
This yields the following parameters: cw =
92 aF/µm, and rw = 0
...
4)
...
(5
...
Solving the following quadratic equation yields a single (meaningful) solution
...
6 × 10

– 18 2

L + 0
...
29 × 10

– 12

or
L = 65 µm
Observe that the extra delay is solely due to the linear factor in the equation, and more specifically due to the extra capacitance introduced by the wire
...
This is
due to the high resistance of the (minimum-size) driver transistors
...
Analyze, for instance, the same problem with the
driver transistors 100 times wider, as is typical for high-speed, large fan-out drivers
...
fm Page 209 Friday, January 18, 2002 9:01 AM

Section 5
...
5

Power, Energy, and Energy-Delay

209

Power, Energy, and Energy-Delay
So far, we have seen that the static CMOS inverter with its almost ideal VTC—symmetrical shape, full logic swing, and high noise margins—offers a superior robustness, which
simplifies the design process considerably and opens the door for design automation
...
It is this combination of robustness and low
static power that has made static CMOS the technology of choice of most contemporary
digital designs
...

5
...
1

Dynamic Power Consumption

Dynamic Dissipation due to Charging and Discharging Capacitances
Each time the capacitor CL gets charged through the PMOS transistor, its voltage rises
from 0 to VDD, and a certain amount of energy is drawn from the power supply
...
During the high-to-low transition, this capacitor is discharged, and the stored energy
is dissipated in the NMOS transistor
...
Let us first consider the low-tohigh transition
...
Therefore, the equivalent circuit
of Figure 5
...
The values of the energy
EVDD, taken from the supply during the transition, as
CL
well as the energy EC, stored on the capacitor at the
end of the transition, can be derived by integrating
Figure 5
...

during the low-to-high transition
...
26
...
39)

0

3
Observe that this model is a simplification of the actual circuit
...
The latter experience a charge-discharge cycle that is out of phase with the capacitances to GND,
i
...
they get charged when Vout goes low and discharged when Vout rises
...


chapter5
...
40)

0

vout

E C = i VDD ( t )v out dt =

dv out
C L ----------- v out dt = C L
dt

Chapter 5

iVDD

t

Charge

Discharge

t

Figure 5
...


These results can also be derived by observing that during the low-to-high transition, CL is loaded with a charge CLVDD
...
The energy stored on the capacitor equals CLVDD2/2
...
The
other half has been dissipated by the PMOS transistor
...
Once again, there is no dependence on the size of the device
...
In order to compute the power consumption, we have to take
into account how often the device is switched
...
41)

f0→1 represents the frequency of energy-consuming transitions, this is 0 → 1 transitions
for static CMOS
...
At
the same time, the total capacitance on the chip (CL) increases as more and more gates are
placed on a single die
...
25 µm CMOS chip with a clock rate of
500 Mhz and an average load capacitance of 15 fF/gate, assuming a fanout of 4
...
5 V supply then equals approximately 50 µW
...
In reality, not all gates in the complete IC switch at the full rate of
500 Mhz
...

Example 5
...
4 is now easily computed
...
2, the value of the load capacitance was determined to equal 6 fF
...
5 V, the amount of energy needed to charge and discharge that capacitance equals

chapter5
...
5

Power, Energy, and Energy-Delay

211
2

E dyn = CL VDD = 37
...
For a tp of 32
...
5), we find that the dynamic power dissipation of the
circuit is
P dyn = Edyn ⁄ ( 2t p ) = 580 µW
Of course, an inverter in an actual circuit is rarely switched at this maximum rate, and
even if done so, the output does not swing from rail-to-rail
...
For a rate of 4 GHz (T = 250 psec), the dissipation reduces to 150 µW
...


Computing the dissipation of a complex circuit is complicated by the f0→1 factor,
also called the switching activity
...

One concern is that the switching activity of a network is a function of the nature and the
statistics of the input signals: If the input signals remain unchanged, no switching happens, and the dynamic power consumption is zero! On the other hand, rapidly changing
signals provoke plenty of switching and hence dissipation
...
We can
accommodate this by another rewrite of the equation, or
2

2

2

P dyn = C L V DD f 0 → 1 = C L V DD P 0 → 1 f = C EFF V DD f

(5
...
CEFF = P0→1CL is called the effective capacitance
and represents the average capacitance switched every clock cycle
...
1) reduces the average consumption to 5 W
...
12 Switching activity

Consider the waveforms on the
right where the upper waveform
represents the idealized clock signal, and the bottom one shows the
signal at the output of the gate
...
25 (or 25%)
...
27 Clock and signal waveforms

Low Energy/Power Design Techniques
With the increasing complexity of the digital integrated circuits, it is anticipated that the power
problem will only worsen in future technologies
...
fm Page 212 Friday, January 18, 2002 9:01 AM

212

THE CMOS INVERTER

Chapter 5

voltages are becoming more and more attractive
...
For instance, reducing VDD from 2
...
25 V for our example drops the power dissipation from 5 W to 1
...
This assumes that the same clock rate can be sustained
...
17
demonstrates that this assumption is not that unrealistic as long as the supply voltage is substantially higher than the threshold voltage
...

When a lower bound on the supply voltage is set by external constraints (as often happens in real-world designs), or when the performance degradation due to lowering the supply
voltage is intolerable, the only means of reducing the dissipation is by lowering the effective
capacitance
...

A reduction in the switching activity can only be accomplished at the logic and architectural abstraction levels, and will be discussed in more detail in later Chapters
...
As most of the capacitance in a combinational logic circuit is due to transistor capacitances (gate and diffusion), it makes sense to keep those contributions to a
minimum when designing for low power
...
This definitely affects the performance of the circuit, but
the effect can be offset by using logic or architectural speed-up techniques
...
This is contrary to common design practices used in cell libraries, where transistors are generally made large to accommodate a range
of loading and performance requirements
...
Assume we have to minimize the energy dissipation of a circuit with a specified lower-bound on the performance
...
Yet, the latter causes the capacitance to
increase
...

Example 5
...
To take the
Cg1
f
1
input loading effects into account, we
assume that the inverter itself is driven by a
Figure 5
...
28)
...

goal is to minimize the energy dissipation of
the complete circuit, while maintaining a
lower-bound on performance
...
The propagation delay of the optimized circuit should
not be larger than that of a reference circuit, chosen to have as parameters f = 1 and Vdd = Vref
...
4
...
fm Page 213 Friday, January 18, 2002 9:01 AM

Section 5
...
43)

with F = (Cext/Cg1) the overall effective fanout of the circuit tp0 is the intrinsic delay of the
inverter
...
(5
...


V DD
t p0 ∼ -----------------------V DD – V TE

(5
...
45)

The performance constraint now states that the propagation delay of the scaled circuit should
be equal (or smaller) to the delay of the reference circuit (f=1, Vdd = Vref)
...
Hence,

F
F
t p0  2 + f + --
 2 + f + -- 

tp
V ref – V TE
f
f
 V DD  ------------------------   --------------------  = 1
-------- = -------------------------------- =  --------- 
- 
t pref
t p0ref ( 3 + F )
V ref V DD – V TE
3+F 



(5
...
(5
...
29a for different values of F
...
Increasing the
size of the inverter from the minimum initially increases the performance, and hence allows
for a lowering of the supply voltage
...
Further increases in the device sizes only increase the self-loading factor, deteriorate the performance, and require an increase in supply voltage
...

4

1
...
5

1

F=1

3

normalized energy

2

vdd (V)

2
...
5

10

1

2

1

5
0
...
5

20
0

1

2

3

4

5

6

7

0

1

2

3

4

f

6

7

f

(a)

5

(b)

Figure 5
...
(a) Required supply voltage as a function of the sizing factor f
for different values of the overall effective fanout F; (b) Energy of scaled circuit (normalized with respect to the reference
case) as a function of f
...
5V, VTE = 0
...


chapter5
...


V DD 2 2 + 2f + F
E-------- =  ---------   ----------------------- 
 V ref   4 + F 
E ref

(5
...
A graphical approach is just as effective
...
29b, from which a number of conclusions can be drawn:
• Device sizing, combined with supply voltage reduction, is a very effective approach in
reducing the energy consumption of a logic network
...
But the gain is also sizable for smaller values of F
...

• Oversizing the transistors beyond the optimal value comes at a hefty price in energy
...

• The optimal sizing factor for energy is smaller than the one for performance, especially for
large values of F
...
53, while fopt(performance) = 4
...
Increasing the device sizes only leads to a minimal supply reduction once
VDD starts approaching VTE, hence leading to very minimal energy gains
...
The finite slope of the input signal causes a direct current path between VDD
and GND for a short period of time during switching, while the NMOS and the PMOS
transistors are conducting simultaneously
...
30
...
48)

as well as the average power consumption
2

P dp = t sc VDD I peak f = C sc VDD f

(5
...
tsc represents the time both devices are conducting
...
(5
...


chapter5
...
5

Power, Energy, and Energy-Delay

215

VDD
VDD – VT
vin
Vin

Isc

VT

Vout
CL

t

Ipeak
ishort

t

Figure 5
...


V DD – 2V T
V DD – 2V T t r ( f
t sc = ------------------------- t s ≈ ------------------------- × -------)
0
...
50)

Ipeak is determined by the saturation current of the devices and is hence directly proportional to the sizes of the transistors
...
This relationship is best illustrated by the following simple analysis: Consider a static CMOS inverter with a 0 → 1 transition at the input
...
31a)
...
31 Impact of load capacitance on short-circuit current
...
As the source-drain
voltage of the PMOS device is approximately 0 during that period, the device shuts off
without ever delivering any current
...

Consider now the reverse case, where the output capacitance is very small, and the output
fall time is substantially smaller than the input rise time (Figure 5
...
The drain-source
voltage of the PMOS device equals VDD for most of the transition period, guaranteeing the
maximal short-circuit current (equal to the saturation current of the PMOS)
...
fm Page 216 Friday, January 18, 2002 9:01 AM

216

THE CMOS INVERTER

Chapter 5

represents the worst-case condition
...
32, which plots the short-circuit current through the NMOS transistor during a
low-to-high transition as a function of the load capacitance
...
5

x 10

-4

CL = 20 fF

2

CL = 100 fF

1

I

sc

(A)

1
...
5

Figure 5
...


0

-0
...
On the other hand,
making the output rise/fall time too large slows down the circuit and can cause short-circuit currents in the fan-out gates
...

Design Techniques
A more practical rule, which optimizes the power consumption in a global way, can be formulated (Veendrick84]):
The power dissipation due to short-circuit currents is minimized by matching the rise/fall
times of the input and output signals
...

Making the input and output rise times of a gate identical is not the optimum solution for
that particular gate on its own, but keeps the overall short-circuit current within bounds
...
33, which plots the short-circuit energy dissipation of an inverter (normalized with respect to the zero-input rise time dissipation) as a function of the ratio r between
input and output rise/fall times
...
For very large
capacitance values, all power dissipation is devoted to charging and discharging the load
capacitance
...

Observe also that the impact of short-circuit current is reduced when we lower the
supply voltage, as is apparent from Eq
...
50)
...
With threshold voltages scaling at a slower rate than the supply voltage, shortcircuit power dissipation is becoming of a lesser importance in deep-submicron technologies
...
fm Page 217 Friday, January 18, 2002 9:01 AM

Section 5
...
125 µm/0
...
375 µm/0
...
3 V

6

P norm

5
4
3

VDD = 2
...
5 V
0
0

1

2

3
t /t

4

5

Figure 5
...
The power is
normalized with respect to zero input rise-time
dissipation
...


sin sout

At a supply voltage of 2
...
5 V, an input/output slope ratio of 2 is
needed to cause a 10% degradation in dissipation
...
(5
...
The value of this short-circuit capacitance is a function of VDD, the transistor
sizes, and the input-output slope ratio
...
5
...
(5
...
51)

Ideally, the static current of the CMOS inverter is equal to zero, as the PMOS and
NMOS devices are never on simultaneously in steady-state operation
...
34
...
For the device sizes under consideration, the leakage current per unit drain area typically ranges between 10-100
pA/µm2 at room temperature
...
5
µm2 and operated at a supply voltage of 2
...
125 mW, which is clearly not much of an issue
...
Their value increases with increasing junction temperature, and this occurs
in an exponential fashion
...
fm Page 218 Friday, January 18, 2002 9:01 AM

218

THE CMOS INVERTER

Chapter 5

VDD
VDD

Vout = VDD
Drain Leakage
Current
Subthreshold current

Figure 5
...


ues
...
As the temperature is a strong function of the dissipated heat and its removal mechanisms, this can only be accomplished by limiting the power dissipation of the circuit
and/or by using chip packages that support efficient heat removal
...

As discussed in Chapter 3, an MOS transistor can experience a drain-source current, even
when VGS is smaller than the threshold voltage (Figure 5
...
The closer the threshold
voltage is to zero volts, the larger the leakage current at VGS = 0 V and the larger the static
power consumption
...
Standard processes feature VT values that are never smaller than
0
...
6V and that in some cases are even substantially higher (~ 0
...

This approach is being challenged by the reduction in supply voltages that typically
goes with deep-submicron technology scaling as became apparent in Figure 3
...
We concluded earlier (Figure 5
...
One approach to address this performance issue is to scale the device
thresholds down as well
...
17 to the left, which means that
the performance penalty for lowering the supply voltage is reduced
...
35
...
The continued scaling
of the supply voltage predicted for the next generations of CMOS technologies however
forces the threshold voltages ever downwards, and makes subthreshold conduction a dominant source of power dissipation
...
An example of the latter is
the SOI (Silicon-on-Insulator) technology whose MOS transistors have slope-factors that
are close to the ideal 60 mV/decade
...
14 Impact of threshold reduction on performance and static power dissipation
Consider a minimum size NMOS transistor in the 0
...
In Chapter 3,
we derived that the slope factor S for this device equals 90 mV/decade
...
5V equals 10-11A (Figure 3
...
Reducing the threshold with 200 mV to 0
...
5 V, this translates into
a static power dissipation of 106 ×170×10-11×1
...
6 mW
...
fm Page 219 Friday, January 18, 2002 9:01 AM

Section 5
...
2

VT = 0
...
35 Decreasing the threshold
increases the subthreshold current at VGS =
0
...
5 W! At that supply voltage,
the threshold reductions correspond to a performance improvement of 25% and 40%, respectively
...
The idea that the leakage current in a static CMOS circuit has to be zero is a preconception
...
As long as the noise margins are within range, this is not a compelling issue
...
This is offset by the drop in supply voltage, that is enabled by the reduced thresholds
at no cost in performance, and results in a quadratic reduction in dynamic power
...
25 µm CMOS process, the following circuit configurations obtain the same performance: 3 V supply–0
...
45 V supply–0
...
The dynamic power consumption of the latter is, however, 45 times smaller [Liu93]! Choosing the correct values of
supply and threshold voltages once again requires a trade-off
...
In the presence of a sizable static power dissipation, it is essential that non-active modules are powered down, lest static power dissipation
would become dominant
...

5
...
3

Putting It All Together

The total power consumption of the CMOS inverter is now expressed as the sum of its
three components:
2
P tot = P dyn + P dp + P stat = ( C L V DD + V DD I peak t s )f 0 → 1 + V DD I leak

(5
...
The
direct-path consumption can be kept within bounds by careful design, and should hence
not be an issue
...


chapter5
...

PDP = P av t p

(5
...

Assuming that the gate is switched at its maximum possible rate of fmax = 1/(2tp), and
ignoring the contributions of the static and direct-path currents to the power consumption,
we find
2

C L V DD
2
PDP = C L V DD f max t p = ---------------2

(5
...
Remember that earlier we had defined Eav as the average
energy per switching cycle (or per energy-consuming event)
...

Energy-Delay Product
The validity of the PDP as a quality metric for a process technology or gate topology is
questionable
...
Yet for a given structure, this number can be made arbitrarily low by
reducing the supply voltage
...
This comes at the
major expense in performance, at discussed earlier
...
The energy-delay product (EDP) does exactly
that
...
55)

It is worth analyzing the voltage dependence of the EDP
...
An optimum operation point should hence exist
...
(5
...

αC L V DD
t p ≈ -----------------------V DD – V Te

(5
...
Combining Eq
...
55) and Eq
...
56), 4
4

This equation is only accurate as long as the devices remain in velocity saturation, which is probably
not the case for the lower supply voltages
...


chapter5
...
5

Power, Energy, and Energy-Delay

221
2

3

αC L V DD
EDP = -------------------------------2 ( V DD – V TE )

(5
...
(5
...

V DDopt = 3 V TE
-2

(5
...
For sub-micron technologies with
thresholds in the range of 0
...

Example 5
...
25 µm CMOS inverter

From the technology parameters for our generic CMOS process presented in Chapter 3, the
value of VTE can be derived
...
43 V, VDsatn = 0
...
74 V
...
4 V, VDsatp = -1 V, VTEp = -0
...

VTE ≈ (VTEn+|VTEp|)/2 = 0
...
8 V = 1
...
The simulated graphs of Figure 5
...
The optimum supply voltage is predicted to equal 1
...
The charts clearly illustrate the trade-off between delay and
energy
...
5

1

1
...
5

Figure 5
...
25 µm CMOS technology
...
For instance, some designs require a
minimum performance, which requires a higher voltage at the expense of energy
...
fm Page 222 Friday, January 18, 2002 9:01 AM

222

THE CMOS INVERTER

Chapter 5

obtaining the overall system performance through the use of architectural techniques such
as pipelining or concurrency
...
5
...

T





0

P av

T

0

V DD
1
= -- p ( t )dt = --------- i DD ( t )dt
T
T

(5
...

Some implementations of SPICE provide built-in functions to measure the average value
of a circuit signal
...
MEASURE TRAN I(VDD) AVG command
computes the area under a computed transient response (I(VDD)) and divides it by the
period of interest
...
(5
...
Other implementations of SPICE are, unfortunately, not as extensive
...
A small circuit can
easily be conceived that acts as an integrator and whose output signal is nothing but the
average power
...
37
...
The resistance R is only provided for DC-convergence reasons and should be chosen as high as possible to minimize leakage
...
The operation
of the circuit is summarized in Eq
...
60) under the assumption that the initial voltage on
the capacitor C is zero
...
60)

T



k
P av = --- i DD dt
C
0

Equating Eq
...
59) and Eq
...
60) yields the necessary conditions for the equivalent
circuit parameters: k/C = VDD/T
...

Example 5
...
4 is analyzed using the above
technique for a toggle period of 250 psec (T = 250 psec, k = 1, VDD = 2
...
The resulting power consumption is plotted in Figure 5
...
3 µW
...
MEAS AVG command yields a value of

chapter5
...
6

Perspective: Technology Scaling and its Impact on the Inverter Metrics

223

VDD

Pav

+

C

Circuit
under test

iDD


k iDD

R

Figure 5
...


160
...
These numbers are equivalent to an energy of 39 fJ (which is close to the 37
...
11)
...
This is due to the
injection of current into the supply, when the output briefly overshoots VDD as a result of the
capacitive coupling between input and output (as is apparent from in the transient response of
Figure 5
...

1
...
6
1
...
2

Vin: 0→1

1
0
...
6
0
...
2
0

0

0
...
5

2

t (sec)

5
...
5
x 10

-10

Figure 5
...


Perspective: Technology Scaling and its Impact on the Inverter
Metrics
In section 3
...
For the sake of clarity, we
repeat here some of the most important entries in the resulting scaling table (Table 3
...

Table 5
...

Parameter

Relation

Full Scaling

General Scaling

Fixed-Voltage Scaling

Area/Device

WL

1/S2

1/S2

1/S2

Intrinsic Delay

RonCgate

1/S

1/S

1/S

chapter5
...
4 Scaling scenarios for short-channel devices (S and U represent the technology and voltage
scaling parameters, respectively)
...

From Figure 5
...
This
rate is on course with the prediction of Table
5
...
15 as
we had already observed in Figure 3
...
The
delay of a 2-input NAND gate with a fanout of
four has gone from tens of nanoseconds in the
1960s to a tenth of a nanosecond in the year Figure 5
...

seconds by 2010
...

Hence, statistics on dissipation-per-gate or design are only marginally available
...
40, which plots the power density measured over a large
number of designs produced between 1980 and 1995
...
This is in correspondence with the fixed-voltage scaling scenario presented in
Table 5
...
For more recent years, we expect a scenario more in line with the full-scaling
model—which predicts a constant power density—due to the accelerated supply-voltage
scaling and the increased attention to power-reducing design techniques
...

The presented scaling model has one fatal flaw however: the performance and
power predictions produce purely “intrinsic” numbers that take only device parameters
into account
...

Similarly, charging and discharging the wire capacitances may dominate the energy budget
...
The impact of the wire capacitance and its
scaling behavior is summarized in Table 5
...
We adopt the fixed-resistance model introduced in Chapter 4
...


chapter5
...
6

Perspective: Technology Scaling and its Impact on the Inverter Metrics

225

∝ S2

Figure 5
...
S is
normalized to 1 for a 4 µm process
...
5 Scaling scenarios for wire capacitance
...
εc represents the impact of fringing
and inter-wire capacitances
...
This impact is limited to an increase with εc for short
wires (S = SL), but it becomes increasingly more outspoken for medium-range and long
wires (SL < S)
...
41
...
The doom-day scenario that interconnect may cause CMOS performance to saturate in the very near future hence may be exaggerated
...
g
...


chapter5
...
41 Evolution of wire delay / gate delay ratio
with respect to technology (from [Fisher98])
...
7

Summary
This chapter presented a rigorous and in-depth analysis of the static CMOS inverter
...
The PMOS is normally made wider than the NMOS due to its inferior current-driving capabilities
...
The logic swing is equal
to the supply voltage and is not a function of the transistor sizes
...
The steady-state response is not affected by fanout
...
To a first order, it can be approximated as
R eqn + R eqp
t p = 0
...
Transistor sizing may help to improve performance as
long as the delay is dominated by the extrinsic (or load) capacitance of fanout and
wiring
...
It is given by P0→1 CLVDD2f
...
The dissipation due to the direct-path
currents occurring during switching can be limited by careful tailoring of the signal

chapter5
...
8

To Probe Further

227

slopes
...

• Scaling the technology is an effective means of reducing the area, propagation delay
and power consumption of a gate
...

• The interconnect component is gradually taking a larger fraction of the delay and
performance budget
...
8

To Probe Further
The operation of the CMOS inverter has been the topic of numerous publications and textbooks
...
An extensive list of references was presented in Chapter
1
...


REFERENCES
[Dally98] W
...
Poulton, Digital Systems Engineering, Cambridge University Press, 1998
...
D
...
Nesbitt, ``The Test of Time: Clock-Cycle Estimation and Test Challenges for Future Microprocessors,'' IEEE Circuits and Devices Magazine, 14(2), pp
...

[Hedenstierna87] N
...
Jeppson, “CMOS Circuit Speed and Buffer Optimization,” IEEE Transactions on CAD, Vol CAD-6, No 2, pp
...

[Liu93] D
...
28, no
...
10-17, Jan
...
10-17
...
Mead and L
...

[Sakurai97] T
...
Kawaguchi, T
...
on Low-Power Electronics and Design,
pp
...
1997
...
Sakurai, T
...

[Sedra87] Sedra and Smith, MicroElectronic Circuits, Holt, Rinehart and Winston, 1987
...
Swanson and J
...
SC-7, No
...
146-152, April
1972
...
Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on
the Design of Buffer Circuits,” IEEE Journal of Solid-State Circuits, Vol
...
4,
pp
...


chapter5
...
9

THE CMOS INVERTER

Chapter 5

Exercises and Design Problems
For all problems, use the device parameters provided in Chapter 3 (as well as the inside back cover),
unless otherwise mentioned
...
2 µm CMOS introduced in Chapter 2, design a static CMOS
inverter that meets the following requirements:
1
...
e
...

2
...
1 nsec)
...
Notice that this
capacitance is substantially larger than the internal capacitances of the gate
...
To reduce the parasitics, use
minimal lengths (L = 1
...
Verify and optimize the design
using SPICE after proposing a first design using manual computations
...
If you have a layout editor (such
as MAGIC) available, perform the physical design, extract the real circuit
parameters, and compare the simulated results with the ones obtained earlier
...
1
6
...
3
...
3
...
2
...
2
...
4
...
2
...
3

Complementary CMOS
Pass-Transistor Logic

6
...
2

Designing Logic for Reduced Supply
Voltages

6
...
3
...
5

Summary

6
...
2

Speed and Power Dissipation of
Dynamic Logic

6
...
1

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Introduction
The design considerations for a simple inverter circuit were presented in the previous
chapter
...
The focus is on combinational logic (or non-regenerative) circuits; this is, circuits that have the property that at any point in time, the output
of the circuit is related to its current input signals by some Boolean expression (assuming
that the transients through the logic gates have settled)
...

This is in contrast to another class of circuits, known as sequential or regenerative,
for which the output is not only a function of the current input data, but also of previous
values of the input signals (Figure 6
...
This is accomplished by connecting one or more
outputs intentionally back to some inputs
...
A sequential circuit includes a combinational logic portion and a module that holds the state
...
Sequential circuits are the topic of the next Chapter
...
1 High level classification of logic circuits
...
As with the
inverter, the common design metrics by which a gate is evaluated are area, speed, energy
and power
...
For
instance, the switching speed of digital circuits is the primary metric in a high-performance processor, while it is energy dissipation in a battery operated circuit
...

We will see that certain logic styles can significantly improve performance, but are more
sensitive to noise
...


6
...
The static CMOS style
is really an extension of the static CMOS inverter to multiple inputs
...
e, low sensitivity to noise), good
performance, and low power consumption with no static power dissipation
...
2

Static CMOS Design

231

properties are carried over to large fan-in logic gates implemented using a similar circuit
topology
...
Also,
the outputs of the gates assume at all times the value of the Boolean function implemented
by the circuit (ignoring, once again, the transient effects during switching periods)
...
The latter approach has the advantage
that the resulting gate is simpler and faster
...

In this section, we sequentially address the design of various static circuit flavors
including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and passtransistor logic
...

6
...
1

Complementary CMOS

Concept
A static CMOS gate is a combination of two networks, called the pull-up network (PUN)
and the pull-down network (PDN) (Figure 6
...
The figure shows a generic N input logic
gate where all inputs are distributed to both the pull-up and pull-down networks
...
Similarly, the function of the PDN
is to connect the output to VSS when the output of the logic gate is meant to be 0
...
In this way, once the transients have settled, a path always exists between VDD and the output F, realizing a high output (“one”),
or, alternatively, between VSS and F for a low output (“zero”)
...

VDD
In1
In2

PUN

InN

pull-up: make a connection from VDD to F when
F(In1,In2,
...
Inn)

In1
In2
PDN
InN

pull-down: make a connection from VDD to Vss when
F(In1,In2,
...
2 Complementary logic gate as a combination of a PUN (pull-up network) and a
PDN (pull-down network)
...
An NMOS
switch is on when the controlling signal is high and is off when the controlling signal
is low
...

• The PDN is constructed using NMOS devices, while PMOS transistors are used in
the PUN
...
To illustrate this, consider the examples shown in Figure 6
...
In Figure 6
...
Two possible discharge scenarios are shown
...
NMOS transistors are hence the preferred devices in the PDN
...
3b, with the output initially at GND
...
This explains why PMOS transistors are preferentially used in a PUN
...
3 Simple examples
illustrate why an NMOS should be
used as a pull-down, and a PMOS
should be used as a pull-up device
...
4)
...
With all the
inputs high, the series combination conducts and the value at one end of the chain is
transferred to the other end
...
A conducting path exists between the output and input terminal if at least one of the inputs is high
...
A series connection of PMOS conducts if
both inputs are low, representing a NOR function (A
...

• Using De Morgan’s theorems ((A + B) = A·B and A·B = A + B), it can be shown that
the pull-up and pull-down networks of a complementary CMOS structure are dual
networks
...
2

Static CMOS Design

A

B

Series Combination

233

A

Conducts if A · B
(a) series

Parallel Combination
Conducts if A + B

B

(b) parallel

Figure 6
...


network, and vice versa
...
g
...
The
other network (i
...
, PUN) is obtained using duality principle by walking the hierarchy, replacing series sub-nets with parallel sub-nets, and parallel sub-nets with
series sub-nets
...

• The complementary gate is naturally inverting, implementing only functions such as
NAND, NOR, and XNOR
...

• The number of transistors required to implement an N-input logic gate is 2N
...
1 Two-input NAND Gate
Figure 6
...
The PDN network consists of two
NMOS devices in series that conduct when both A and B are high
...
This means that F is 1 if A = 0 or B = 0,
which is equivalent to F = A·B
...
1
...

VDD
Table 6
...
5 Two-input NAND gate in complementary static CMOS style
...
2 Synthesis of complex CMOS Gate
Using complementary CMOS logic, consider the synthesis of a complex CMOS gate whose
function is F = D + A· (B +C)
...
6a by using the fact that NMOS devices in series

234

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

implements the AND function and parallel device implements the OR function
...
The PDN network is broken into
smaller networks (i
...
, subset of the PDN) called sub-nets that simplify the derivation of the
PUN
...
6b, the sub-nets (SN) for the pull-down network are identified At the top
level, SN1 and SN2 are in parallel so in the dual network, they will be in series
...
On the other hand, we
need to recursively apply the duality rules to SN2
...
Finally, inside SN3, the devices are in parallel so they appear in series in the PUN
...
6c
...

VDD

VDD
C

SN1

D

D
B

C

B

SN2

A

A

A

SN4

F

F

SN3
B

D

C

F
(a) pull-down network

(b) Deriving the pull-up network
hierarchically by identifying
sub-nets

A
D
B

C

Figure 6
...

(c) complete gate

Static Properties of Complementary CMOS Gates
Complementary CMOS gates inherit all the nice properties of the basic CMOS inverter
...
The circuits also have no
static power dissipation, since the circuits are designed such that the pull-down and pullup networks are mutually exclusive
...

Consider the static two-input NAND gate shown in Figure 6
...
Three possible input
combinations switch the output of the gate from high-to-low: (a) A = B = 0 → 1, (b) A= 1,
B = 0 → 1, and (c) B= 1, A = 0 → 1
...
The large variation between case (a) and the others (b & c) is explained
by the fact that in the former case both transistors in the pull-up network are on simultaneously for A=B=0, representing a strong pull-up
...
The VTC is shifted to the left as a result of the weaker PUN
...
For the NMOS devices to turn on, both gate-tosource voltages must be above VTn, with VGS2 = VA - VDS1 and VGS1 = VB
...
2

Static CMOS Design

235

3
...
0

1
...
0
0
...
0

2
...
0

Vin, V
Figure 6
...
NMOS devices are
0
...
25µm while the PMOS devices are sized at 0
...
25µm
...
The
threshold voltages of the two devices are given by:
V Tn2 = V tn0 + γ ( (

2φ f + Vint ) –

VTn1 = V tn0

2φ f )

(6
...
2)

For case (b), M3 is turned off, and the gate voltage of M2 is set to VDD
...
Since the drive on M2 is large,
this resistance is small and has only a small effect on the voltage transfer characteristics
...
The overall impact
is quite small as seen from the plot
...
For the above example, a glitch on only one of the two inputs has a
larger chance of creating a false transition at the output than when the glitch would occur on
both inputs simultaneously
...
A
common practice when characterizing gates such as NAND and NOR is to connect all the
inputs together
...
The data
dependencies should be carefully modeled
...

For the purpose of delay analysis, each transistor is modeled as a resistor in series with an
ideal switch
...
The logic is transformed into an equivalent RC network that includes the effect of
internal node capacitances
...
8 shows the two-input NAND gate and its equivalent
RC switch level model
...
While complicating the analysis, the capacitance of the internal nodes can have quite an impact in
some networks such as large fan-in gates
...

VDD

VDD
A

M3

RP

B
M4
F

A

M2

B

M1

A

RP

B
F
RN

CL

A

Figure 6
...


RN

(a) Two-input NAND

Cint
B

(b) RC equivalent model

A simple analysis of the model shows that—similar to the noise margins—the
propagation delay depends upon the input patterns
...
Three possible input scenarios can be identified for charging the output to
VDD
...
The delay in this case is
0
...
This is not the worst-case low-tohigh transition, which occurs when only one device turns on, and is given by 0
...
For the pull-down path, the output is discharged only if both A and B are switched
high, and the delay is given by 0
...
In other words, adding
devices in series slows down the circuit, and devices must be made wider to avoid a performance penalty
...

For example, for a NAND gate to have the same pull-down delay (tphl) as a minimum-sized inverter, the NMOS devices in the NAND stack must be made twice as wide
so that the equivalent resistance the NAND pull-down is the same as the inverter
...

This first-order analysis assumes that the extra capacitance introduced by widening
the transistors can be ignored
...

Example 6
...
8a
...
5µm/0
...
75µm/0
...
This sizing should result in approximately
equal worst-case rise and fall times (since the effective resistance of the pull-down is
designed to be equal to the pull-up resistance)
...
2

Static CMOS Design

237

Figure 6
...
As
expected, the case where both inputs transition go low (A = B = 1→0) results in a smaller
delay, compared to the case where only one input is driven low
...
The reason for this involves
the internal node capacitance of the pull-down stack (i
...
, the source of M2)
...
On the other hand, for the case where A=1 and B transitions from 1→0, the pull-up PMOS device has to charge up the sum of the output and the
internal node capacitances, which slows down the transition
...
0

Input Data
Pattern

Voltage, V

A = 1, B = 1→0

1
...
0

Delay
(psec)

A = B= 0→ 1

A = B = 1→0

50

A=B=1→ 0

-1
...
0

76

A= 1→ 0, B = 1

57

Figure 6
...


The table in Figure 6
...
The firstorder transistor sizing indeed provides approximately equal rise and fall delays
...
For example, when both inputs transition from 0→1, it is important to establish the
state of the internal node
...
The worst case can be ensured by pulsing the A input from 1 →0→1, while input B
only makes the 0→1
...

The important point to take away from this example is that estimation of delay can be
fairly complex, and requires a careful consideration of internal node capacitances and data
patterns
...
A brute
force approach that applies all possible input patterns, may not always work as it is important
to consider the state of internal nodes
...
10
...
The worst-case
pull-down transition happens when only one of the NMOS devices turns on (i
...
, if either
A or B is high)
...
5µm/0
...
5µm/0
...
Since the pull-down path in the worst case is a
single device, the NMOS devices (M1 and M2) can have the same device widths as the
NMOS device in the inverter
...
Since the resistances add, the devices must be made two times larger compared
to the PMOS in the inverter (i
...
, M3 and M4 must have a size of 3µm/0
...
Since
PMOS devices have a lower mobility relative to NMOS devices, stacking devices in series

238

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

must be avoided as much as possible
...

VDD

VDD

RP

M4

B

B

M1

F

M2

B

RN
A

Problem 6
...
10 Sizing of a NOR gate to
produce the same delay as an inverter with
size of NMOS: 0
...
25µm and PMOS:
1
...
25µm
...
6c such that it has
approximately the same tplh and tphl as a inverter with the following sizes: NMOS:
0
...
25µm and PMOS: 1
...
25µm
...
This is often a reasonable assumption for a first-order analysis
...
Consider a 4-input NAND gate as shown in Figure 6
...
The
internal capacitances consist of the junction capacitance of the transistors, as well as the
gate-to-source and gate-to-drain capacitances
...
The delay analysis for such a circuit involves solving
distributed RC networks, a problem we already encountered when analyzing the delay of
interconnect networks
...
The output is discharged when all inputs are driven high
...

VDD

VDD
A

M5 B

M7 D

M6 C
A
B

M4
M3

M8

A

R6

R7

C

R5
B

D

M2

A
R3

R2
C

D

M1

F
CL

R4

B
C

R8

R1
D

C3

C2

C1

Figure 6
...


Section 6
...
69 ( R 1 ⋅ C 1 + ( R 1 + R 2 ) ⋅ C 2 + ( R 1 + R 2 + R 3 ) ⋅ C 3 + ( R 1 + R 2 + R 3 + R 4 ) ⋅ C L )

(6
...
Assuming that all NMOS
devices have an equal size, Eq
...
3) simplifies to
t

pHL

= 0
...
4)

Example 6
...
Assume that all NMOS devices have a
W/L of 0
...
25µm, and all PMOS devices have a device size of 0
...
25µm
...
12
...

Using techniques similar to those employed for the CMOS inverter in Chapter 3, the
capacitances values can be computed from the layout
...
Using our standard design rules, the area and perimeter for various devices can be
easily computed as shown in Table 6
...
While the output makes a transition
from VDD to 0, the internal nodes only transition from VDD-VTn to GND
...

VDD

Out

GND
A

B

C

D

Figure 6
...


240

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Table 6
...

Transistor

W (µm)

AS (µm2)

AD (µm2)

PS (µm)

PD(µm)

1

0
...
3125

0
...
75

0
...
5

0
...
0625

0
...
25

3

0
...
0625

0
...
25

0
...
5

0
...
3125

0
...
75

5

0
...
296875

0
...
875

0
...
375

0
...
171875

0
...
875

7

0
...
171875

0
...
875

0
...
375

0
...
171875

1
...
875

It is assumed that the output connects to a single, minimum-size inverter
...
The various contributions are summarized in
Table 6
...
For the NMOS and PMOS junctions, we use Keq = 0
...
61, and Keq
= 0
...
86, respectively
...

Table 6
...
The table shows
the intrinsic delay of the gate without extra loading
...

Capacitor

Contributions (H→L)

Value (fF) (H→L)

C1

Cd1 + Cs2 + 2 * Cgd1 + 2 * Cgs2

(0
...
0625 * 2+ 0
...
25 * 0
...
57 * 0
...
61 * 0
...
28) +
2 * (0
...
5) + 2 * (0
...
5) = 0
...
57 * 0
...
61 * 0
...
28) +
(0
...
0625 * 2+ 0
...
25* 0
...
31 * 0
...
31 * 0
...
85fF

C3

Cd3 + Cs4 + 2 * Cgd3 + 2 * Cgs4

(0
...
0625 * 2+ 0
...
25 * 0
...
57 * 0
...
61 * 0
...
28) +
2 * (0
...
5) + 2 * (0
...
5) = 0
...
57 * 0
...
61 * 1
...
28) +
2 * Cgd5+2 * Cgd6+ 2 * Cgd7+ 2 * Cgd8 2 * (0
...
5)+ 4 * (0
...
171875* 1
...
86
= Cd4 + 4 * Cd5 + 4 * 2 * Cgd6
* 0
...
22)+ 4 * 2 * (0
...
375) = 3
...
(6
...
69  --------------  ( 0
...
85fF + 3 ⋅ 0
...
47 fF ) = 85 p s
 2 
The simulated delay for this particular transition was found to be 86 psec! The hand analysis
gives a fairly accurate estimate given all assumptions and linearizations made
...
This is not entirely the case, as during the transition some other contributions come in
place depending upon the operating region
...
2

Static CMOS Design

241

provide a totally accurate delay prediction, but rather to give intuition into what factors influence the delay and to aide in initial transistor sizing
...
The simulated worst-case low-to-high delay time
for this gate was 106ps
...
e
...
First, the number of transistors required to
implement an N fan-in gate is 2N
...
The
second problem is that propagation delay of a complementary CMOS gate deteriorates
rapidly as a function of the fan-in
...
For an N-input NAND gate, the output capacitance increases
linearly with the fan-in since the number of PMOS devices connected to the output node
increases linearly with the fan-in
...

For the same N-input NAND gate, the effective resistance of the PDN path increases linearly with the fan-in
...

The fan-out has a large impact on the delay of complementary CMOS logic as well
...

The above observations are summarized by the following formula, which approximates the influence of fan-in and fan-out on the propagation delay of the complementary
CMOS gate
t p = a 1 FI + a 2 FI 2 + a 3 FO

(6
...

At first glance, it would appear that the increase in resistance for larger fan-in can be
solved by making the devices in the transistor chain wider
...
For
the N-input NAND gate, the low-to-high delay only increases linearly since the pull-up
resistance remains unchanged and only the capacitance increases linearly
...
13 show the propagation delay for both transitions as a function of fan-in
assuming a fixed fan-out (NMOS: 0
...
5µm)
...
The
simultaneous increase in the pull-down resistance and the load capacitance results in an
approximately quadratic relationship for tpHL
...

Design Techniques for Large Fan-in
Several approaches may be used to reduce delays in large fan-in circuits
...
13 Propagation delay of
CMOS NAND gate as a function of
fan-in
...


1
...
This lowers the resistance of devices in series and lowers the time constant
...
This technique should, therefore, be used with caution
...
A more comprehensive approach towards sizing transistors in complex CMOS
gates is discussed in the next section
...
Progressive Transistor Sizing

An alternate approach to uniform sizing (in which each transistor is scaled up uniformly), is to use progressive transistor sizing (Figure 6
...
Referring back to Eq
...
3), we see
that the resistance of M1 (R1) appears N times in the delay equation, the resistance of M2 (R2)
appears N-1 times, etc
...
Consequently, a progressive scaling of the transistors is beneficial: M1 > M2
> M3 > MN
...
For an excellent treatment on the optimal sizing of transistors in a complex network, we refer the interested reader to [Shoji88, pp
...
The reader should be aware of
Out
InN

MN

In3

M3

In2

M2

C2

In1

M1

C1

CL

C3

M1 > M 2 > M3 > M N

Figure 6
...


Section 6
...
15 Influence of transistor ordering on delay
...


one important pitfall of this approach
...
Very often, design-rule considerations force the designer to push the transistors apart, which causes the internal capacitance
to grow
...
Input Re-Ordering

Some signals in complex combinational logic blocks might be more critical than others
...
An input signal to a gate is called critical if it is the last signal of
all inputs to assume a stable value
...

Putting the critical-path transistors closer to the output of the gate can result in a speedup
...
15
...
Suppose
further that In2 and In3 are high and that In1 undergoes a 0→1 transition
...
In case (a), no path to GND exists until M1 is turned on, which is
unfortunately the last event to happen
...
In the second case,
C1 and C2 are already discharged when In 1 changes
...

4
...
16
...
Partitioning the NOR-gate into two threeinput gates results in a significant speed-up, which offsets by far the extra delay incurred by
turning the inverter into a two-input NAND gate
...
16 Logic restructuring
can reduce the gate fan-in
...
The sizing of devices should happen in its proper context
...
In Chapter 5 we found out
that an optimal fanout for a chain of inverters driving a load CL is (CL/Cin)1/N, where N is
the number of stages in the chain, and Cin the input capacitance of the first gate in the
chain
...
Can this result be extended to determine
the size of any combinational path for minimal delay? By extending our previous
approach to address complex logic networks, we will find out that this is indeed possible
[Sutherland99]
...
6)

t p = t p0 ( p + gf ⁄ γ )

(6
...
In this context, f is often called the
electrical effort
...
The more involved structure of the multiple-input gate, combined with its series devices, increases its intrinsic delay
...
Table 6
...

Table 6
...

Gate type

p

Inverter

1

n-input NAND

n

n-input NOR

n

n-way multiplexer

2n

XOR, NXOR

n2n-1

1
The approach introduced in this section is commonly called logical effort, and was first introduced in
[Sutherland99], which presents an extensive treatment of the topic
...


Section 6
...
In other
words, the logical effort of a logic gate tells how much worse it is at producing output current than an inverter, given that each of its inputs may contain only the same input capacitance as the inverter
...
Logical effort is a useful
parameter, because it depends only on circuit topology
...
4
...
4 Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2
...
5 Logical effort of complex gates
Consider the gates shown in Figure 6
...
Assuming an PMOS/NMOS ratio of 2, the input
capacitance of a minimum-sized symmetrical inverter equals 3 times the gate capacitance of a
minimum-sized NMOS (called Cunit)
...
This increases the input capacitance of the 2-input NOR to 4 Cunit, or 4/3 the capacitance
of the inverter
...
Equivalently, for the same input capacitance, the NAND and NOR gate have 4/3 and 5/3 less driving
strength than the inverter
...
’ Hence, gNAND = 4/3, and gNOR = 5/3
...
17 Logical effort of 2-input NAND
and NOR gates
...
(6
...
Figure 6
...
The slope of
ve
3
Effort
In
Delay
the line is the logical effort of the gate;
2
its intercept is the intrinsic delay
...
Observe also that fanout and
Figure 6
...

a similar way
...

The total delay of a path through a combinational logic block can now be expressed
as
N

tp =

∑t

N

p, j

= t p0

j=1

∑  p + -------

γ 
fj gj

j

(6
...
By finding N – 1 partial derivatives and setting theme to zero,
we find that each stage should bear the same ‘effort’:
f1 g 1 = f2 g2 = … = f N g N

(6
...

F = f 1 f 2 …f N = C L ⁄ C g1
G = g 1 g 2 …g N

(6
...
From here on, the
analysis proceeds along the same lines as the inverter chain
...
11)

and the minimum delay through the path is

D = t p0 


N



j=1

N ( N H )
p j + ------------------ 
γ 

(6
...
2

Static CMOS Design

247

Note that the overall intrinsic delay is a function of the types of logic gates in the path, and
is not affected by the sizing
...
6 Sizing combinational logic for minimum delay
Consider the logic network of Figure 6
...
The output of the network is loaded with a capacitance which is 5 times
larger than the input capacitance of the first gate, which is a minimum-sized inverter
...
Using the entries in Table 6
...
93
...
93; f2 = 1
...
16; f3 = 1
...
93
...
From this, we can derive the sizes of the gates (with respect to
their minimum-sized versions): a = f1/g2 = 1
...
34; c = f1f2f3/g4=2
...

These calculations do not have to be very precise
...
5 still result in circuits within 5% of minimum
delay
...


1

a

b

c
5
Figure 6
...


Power Consumption in CMOS Logic Gates
The sources of power consumption in a complementary CMOS inverter were discussed in
detail in Chapter 5
...
The
power dissipation is a strong function of transistor sizing (which affects physical capacitance), input and output rise/fall times (which affects the short-circuit power), device
thresholds and temperature (which affect leakage power), and switching activity
...
Making a gate more complex
mostly affects the switching activity α0→1, which has two components: a static component
that is only a function of the topology of the logic network, and a dynamic one that results
from the timing behavior of the circuit—the latter factor is also called glitching
...
For static CMOS gates with statistically independent inputs, the static
transition probability is the probability p0 that the output will be in the zero state in one
cycle, multiplied by the probability p1 that the output will be in the one state in the next
cycle:
α0 → 1 = p0 • p 1 = p0 • ( 1 – p 0 )

(6
...
14)

where N0 is the number of zero entries and N1 is the number of one entries in the output
column of the truth table of the function
...
5
...

Table 6
...


A

B

Out

0

0

1

0

1

0

1

0

0

1

1

0

From Table 6
...
(6
...
15)

Problem 6
...

Signal Statistics—The switching activity of a logic gate is a strong function of the input
signal statistics
...
For
example, consider once again a 2-input static NOR gate, and let pa and pb be the
probabilities that the inputs A and B are one
...
The probability that the output node equals one is given by
p1 = (1-pa) (1-pb)

(6
...
17)

Section 6
...
20 Transition activity of
a two-input NOR gate as a
function of the input probabilities
(pA,pB)

Figure 6
...
Observe how
this graph degrades into the simple inverter case when one of the input probabilities is set
to 0
...

Problem 6
...

The results to be obtained are given in Table 6
...

Table 6
...

α0→1
AND

(1 – pApB)pApB

OR

(1 – pA)(1 – pB)[1 – (1 – pA)(1 – pB)]

XOR

[1 – (pA + pB – 2pApB)](pA + pB – 2pApB)

Inter-signal Correlations—The evaluation of the switching activity is further
complicated by the fact that signals exhibit correlation in space and time
...
This is best illustrated with a
simple example
...
21a, and assume that the
primary inputs, A and B, are uncorrelated and uniformly distributed
...
The probability that the node Z
undergoes a power consuming transition is then determined using the AND-gate expression of Table 6
...

p0->1 = (1- pa pb) pa pb = (1-1/2 • 1/2) 1/2 • 1/2 = 3/16

(6
...
21 Example illustrating the effect of signal correlations
...

This approach, however, has two major limitations: (1) it does not deal with circuits with
feedback as found in sequential circuits; (2) it assumes that the signal probabilities at the
input of each gate are independent
...
For instance, the inputs to the AND
gate in Figure 6
...
The
approach to compute probabilities, presented previously, fails under these circumstances
...
This value for transition probability is clearly false, as logic transformations show that the network can be reduced to Z = C•B = A•A = 0, and no transition
will ever take place
...
This can be accomplished with the aid of conditional probabilities
...

pZ = p(Z=1) = p(B=1, C=1)

(6
...
If
B and C are independent, p(B=1,C=1) can be decomposed into p(B=1) • p(C=1), and this
yields the expression for the AND-gate, derived earlier: pZ = p(B=1) • p(C=1) = pB pC
...
21b), a conditional probability has to be employed, such as
pZ = p(C=1|B=1) • p(B=1)

(6
...
(6
...
The
extra condition is necessary as C is dependent upon B
...

Deriving those expressions in a structured way for large networks with reconvergent
fanout is complex, especially when the networks contain feedback loops
...
To be meaningful, the analysis program has to process a typical
sequence of input signals, as the power dissipation is a strong function of statistics of those
signals
...
In reality, the finite propagation delay from one

Section 6
...


3
...
0

Out6
Out8
Out7

1
...
22 Glitching in a chain of NAND
gates
...
0

0

200

400

600

time, psec

logic block to the next can cause spurious transitions, called glitches, critical races, or
dynamic hazards, to occur: a node can exhibit multiple transitions in a single clock cycle
before settling to the correct logic level
...
22, which displays
the simulated response of a chain of NAND gates for all inputs going simultaneously from
0 to 1
...
For this particular transition, all the odd bits must transition to 0 while the even bits remain at the value of 1
...
When the correct input ripples through the network, the output goes high
...
Although the glitches in this example are
only partial (i
...
, not from rail to rail), they contribute significantly to the power dissipation
...

Design Techniques to Reduce Switching Activity
The dynamic power of a logic gate can be reduced by minimizing the physical capacitance and
the switching activity
...

The switching activity, on the other hand, can be minimized at all level of the design abstraction, and is the focus of this section
...


1
...
Consider for
instance two alternate implementations of F = A • B • C • D, as shown in Figure 6
...
Ignore

252

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

O1

A
B

C

O2

A
B

F

D

C
D

Chain structure

Chapter 6

O1
F
O2

Tree structure

Figure 6
...


glitching and assume that all primary inputs (A,B,C,D) are uncorrelated and uniformly distributed (i
...
, p1 (a,b,c,d)= 0
...
Using the expressions from Table 6
...
7
...
However, as mentioned before, it is also important to consider the timing behavior to
accurately make power trade-offs
...

Table 6
...


O1

O2

F

p1 (chain)

1/4

1/8

1/16

p0 = 1-p1 (chain)

3/4

7/8

15/16

p0->1 (chain)

3/16

7/64

15/256

p1 (tree)

1/4

1/4

1/16

p0 = 1-p1 (tree)

3/4

3/4

15/16

p0->1 (tree)

3/16

3/16

15/256

2
...
24
...
Since both circuits implement identical logic functionality, it is clear that
the activity at the output node Z is equal in both cases
...
In the first circuit, this activity equals (1 − 0
...
2) (0
...
2) = 0
...
In
the second case, the probability that a 0 → 1 transition occurs equals (1 – 0
...
1) (0
...
1)
= 0
...
This is substantially lower
...
e
...
5)
...

3
...
Unfortunately, the
minimum area solution does not always result in the lowest switching activity
...
25
...
2

Static CMOS Design

253

A

B

B

p(A = 1) = 0
...
2
p(C = 1) = 0
...
24 Reordering of inputs affects the circuit activity
...

If data being transmitted were random, it will make no difference which architecture is
used
...
Suppose, for
instance, that A is always (or mostly) 1 and B is (mostly) 0
...
However,
in the time-multiplexed solution, the bus toggles between 0 and 1
...

A
C

A

0

t

B

1

B
C

(a) parallel data transmission

0

A

1

B

t

C

t

(b) serial data transmission

Figure 6
...


4
...
If all input signals of a gate change simultaneously, no glitching occurs
...
Such a mismatch in signal timing is typically the result of different path lengths with respect to the primary inputs of the network
...
26
...
The first network (a) suffers from glitching as a result of the wide disparity between
the arrival times of the input signals for a gate
...
Redesigning the network so that all arrival
times are identical can dramatically reduce the number of superfluous transitions (network b)
...
26 Glitching is influenced by matching of signal path lengths
...


technology, but requires 2N transistors to implement a N-input logic gate
...
This has opened the door for alternative logic families that either are simpler or
faster
...
2
...
The
purpose of the PUN in complementary CMOS is to provide a conditional path between
VDD and the output when the PDN is turned off
...
27a)
...
Figure 6
...

VDD

VDD
PMOS
load

load
F
In1
In2
In3

F
In1
In2
In3

PDN
(a) generic

PDN
(b) pseudo-NMOS

Figure 6
...


The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1
versus 2N for complementary CMOS)
...
On the other hand, the nominal low output voltage is

Section 6
...
This results in reduced noise margins and more importantly static power dissipation
...
Since
the voltage swing on the output and the overall functionality of the gate depends upon the
ratio between the NMOS and PMOS sizes, the circuit is called ratioed
...

Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along paths
similar to those used for its complementary CMOS counterpart
...
At
this operation point, it is reasonable to assume that the NMOS device resides in linear
mode (since the output should ideally be close to 0V), while the PMOS load is saturated
...
21)

Assuming that VOL is small relative to the gate drive (VDD-VT) and that VTn is equal
to VTp in magnitude, VOL can be approximated as:
k p ( – V DD – V Tp ) ⋅ V DSAT µ p ⋅ W p
V OL ≈ --------------------------------------------------------- ≈ ---------------- ⋅ V DSAT
k n ( V DD – V Tn )
µn ⋅ Wn

(6
...
Unfortunately, this has a negative impact on
the propagation delay for charging up the output node since the current provided by the
PMOS device is limited
...
The static power consumption in the low-output mode is easily derived
2

P low

V DSATp
= V DD I low ≈ V DD ⋅ k p  ( – V DD – V Tp ) ⋅ V DSATp – ---------------- 

2 

(6
...
7 Pseudo-NMOS Inverter
Consider a simple pseudo-NMOS inverter (where the PDN network in Figure 6
...
5µm/0
...
The effect of sizing the
PMOS device is studied in this example to demonstrate the impact on various parameters
...
5 to 0
...
Devices
with a W/L < 1 are constructed by making the length longer than the width
...
28
...
8 summarizes the nominal output voltage (VOL), static power dissipation, and
the low-to-high propagation delay
...
25V from VOL (which is not 0V for this inverter)
...
25V
...
A larger pull-up device improves performance, but increases
static power dissipation and lowers noise margins (i
...
, increases VOL)
...
0

2
...
0

Vout, V

1
...
0

W/Lp = 0
...
5

W/Lp =
...
0
0
...
5

1
...
5

2
...
5

Figure 6
...


Vin, V

Table 6
...


Size

VOL

Static Power
Dissipation

tplh

4

0
...
273V

298µW

56ps

1


...
5

0
...
25

0
...
For a
PMOS W/L of 4, VOL is given by (30/115) (4) (0
...
66V
...
However, pseudoNMOS still finds use in large fan-in circuits
...
29 shows the schematics of pseudoNMOS NOR and NAND gates
...

VDD
VDD

F
A

B

C

D

In1

Out

CL

(a) NOR
Figure 6
...


In2

In3

In4

(b) NAND

Section 6
...
4

257

NAND Versus NOR in Pseudo-NMOS

Given the choice between NOR or NAND logic, which one would you prefer for implementation in pseudo-NMOS?

How to Build Even Better Loads
It is possible to create a ratioed logic style that completely eliminates static currents and
provides rail-to-rail swing
...
A differential gate requires that each input is provided in complementary
format, and produces complementary outputs in turn
...
A example of such a logic family,
called Differential Cascode Voltage Switch Logic (or DCVSL), is presented conceptually
in Figure 6
...

The pull-down networks PDN1 and PDN2 use NMOS devices and are mutually
exclusive (this is, when PDN1 conducts, PDN2 is off, and when PDN1 is off, PDN2 conducts), such that the required logic function and its inverse are simultaneously implemented
...
Turning on PDN1, causes
Out to be pulled down, although there is still a fight between M1 and PDN1
...
PDN1 must be strong enough
to bring Out below VDD-|VTp|, the point at which M2 turns on and starts charging Out to
VDD —eventually turning off M1
...

Figure 6
...
Notice that it is possible to share
transistors among the two pull-down networks, which reduces the implementation overhead
VDD

VDD

M1

VDD

Out

M2
Out

Out
A
A
B
B

Out
B
PDN1

B

B

B

PDN2
A

(a) Basic principle

A

(b) XOR-XNOR gate

Figure 6
...


The resulting circuit exhibits a rail-to-rail swing, and the static power dissipation is
eliminated: in steady state, none of the stacked pull-down networks and load devices are
simultaneously conducting
...
In addition to the problem of increase complexity in design, this circuit style still
has a power-dissipation problem that is due to cross-over currents
...

Example 6
...
Notice
that as Out is pulled down to VDD-|VTp|, Out starts to charge up to VDD quickly
...
A static CMOS AND
gate (NAND followed by an inverter) has a delay of 200ps
...
5

Out = A B

M1

A

M3 B

M4

Voltage,V

A

Out = A B
AB
1
...
5

B

AB
A,B

A,B

M2
-0
...
2

0
...
6
Time, ns

0
...
0

Figure 6
...
M1 and M2
1µm/0
...
5µm/0
...
5µm/0
...


Design Consideration: Single-ended versus Differential
The DCVSL gate provides differential (or complementary) outputs
...
This is a distinct advantage,
as it eliminates the need for an extra inverter to produce the complementary signal
...
Finally, the approach prevents some of the time-differential problems introduced by additional inverters
...
When the complementary signal is generated
using an inverter, the inverted signal is delayed with respect to the original (Figure 6
...
This
causes timing problems, especially in very high-speed designs
...
32b)
...

Additionally, the dynamic power dissipation is high
...
2

Static CMOS Design

259
Vout2

Vout1
Vin

Vout2

Vin
Vout1

Vout2

Vout1

Vout1

(a) Single-ended

6
...
3

Vout2

(b) Differential

Figure 6
...


Pass-Transistor Logic

Pass-Transistor Basics
A popular and widely-used alternative to complementary CMOS is pass-transistor logic,
which attempts to reduce the number of transistors required to implement logic by allowing the primary inputs to drive gate terminals as well as source/drain terminals
[Radhakrishnan85]
...

Figure 6
...
In this gate, if the B input is high, the top transisA
tor is turned on and copies the input A to the output F
...
The switch driven by B seems to be
redundant at first glance
...
33 Pass-transistor
ensure that the gate is static, this is that a low-imped- implementation of an AND gate
...

The promise of this approach is that fewer transistors are required to implement a given
function
...
33 requires 4 transistors (including the inverter required to invert B), while a complementary CMOS implementation would require 6 transistors
...

Unfortunately, as discussed earlier, an NMOS device is effective at passing a 0 but
is poor at pulling a node to VDD
...
In fact, the situation is worsened by the fact that the devices
experience body effect, as there exists a significant source-to-body voltage when pulling
high
...
Let the source of the NMOS pass transistor be labeled x
...
24)

260

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Example 6
...
5V, the transient response of Figure 6
...
Assume that node x was initially 0
...
0

In
1
...
25µm
x

VDD

0
...
25µm

Out
0
...
25µm

Voltage, V

IN

Out

2
...
0

0
...
5

1
Time, ns

1
...
34 Transient response of charging up a node using an N device
...


node x is in a high impedance state (not driven to one of the rails using a low resistance path)
...
Notice that the output charges up quickly initially, but has slow tail
...

Hand calculation using Eq
...
24), results in an output voltage of 1
...


WARNING:
The above example demonstrates that pass-transistor gates cannot be cascaded by connecting the output of a pass gate to the gate input of another pass transistor
...
35a, where the output of M1 (node x) drives the gate of another
MOS device
...
If node C has a rail to rail swing, node Y
only charges up to the voltage on node x - VTn2, which works out to VDD-VTn1-VTn2
...
35b on the other hand has the output of M1 (x) driving the junction of M2, and there is
only one threshold drop
...

B
A

x

B

M1
A
C

Y

Out

C
x

M1

Y
M2

Out

M2
Swing on Y = VDD- VTn1
Swing on Y = VDD- VTn- VTn2
(b)
(a)
Figure 6
...


Section 6
...
10 VTC of the pass transistor AND gate
The voltage transfer curve of a pass-transistor gate shows little resemblance to complementary CMOS
...
36
...
For the case when B = VDD, the top pass
transistor is turned on, while the bottom one is turned off
...
e
...
Next consider the case when A=VDD, and B makes a transition from 0 → 1
...
Once the bottom pass transistor turns off, the output follows the input B
minus a threshold drop
...

Observe that a pure pass-transistor gate is not regenerative
...
This can be remedied by the occasional insertion of a CMOS inverter
...


1
...
25µm

A
0
...
25µm
B

B=VDD, A = 0→VDD

Vout, V

0
...
25µm
B

2
...
0

A=Vdd, B = 0→VDD
A= B = 0→VDD

F = AB

0
0
...
25µm
0
...
0

1
...
0
Vin, V

Figure 6
...
33
...
For the pass transistor circuit in Figure 6
...
The output node charges from 0V
to VDD-VTn (assuming that node x was initially at 0V) and the energy drawn from the
power supply for charging the output of a pass transistor is given by:
T
E0 → 1 =

∫ P (t )dt = V ∫ isupply (t )dt = V
DD

0

( V DD – V Tn )

T

0

DD



C L dV out = C L • V DD • ( V DD – V Tn )

(6
...


262

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
A
A
B
B

Pass-Transistor
Network

F

A
A
B
B

Inverse
Pass-Transistor
Network

Chapter 6

F

(a) Basic concept
B

B

B

A

B

B

A

A

B

F = AB

A

B

F=A+B

F = AB
AND/NAND

B

A

F=A⊕B

A

A

B

B

F=A+B

OR/NOR
(b) Example pass-transistor networks

A

F=A⊕B

XOR/NXOR

Figure 6
...


Differential Pass Transistor Logic
For high performance design, a differential pass-transistor logic family, called CPL or
DPL, is commonly used
...
A number of CPL gates
(AND/NAND, OR/NOR, and XOR/NXOR) are shown in Figure 6
...
These gates possess a number of interesting properties:
• Since the circuits are differential, complementary data inputs and outputs are always
available
...
Furthermore,
the availability of both polarities of every signal eliminates the need for extra inverters, as is often the case in static CMOS or pseudo-NMOS
...
This is
advantageous for the noise resilience
...
In effect, all gates use exactly the same topology
...
This makes the design of a library of gates very simple
...


Section 6
...
11 Four-input NAND in CPL
Consider the implementation of a four-input AND/NAND gate using CPL
...
38)
...
This is substantially higher than previously discussed gates
...
One should, however, be aware of the fact that the structure
simultaneously implements the AND and the NAND functions, which might reduce the transistor count of the overall circuit
...
38 Layout and schematics of four-input NAND-gate using CPL (the final inverter stage is
omitted)
...


In summary, CPL is a conceptually simple and modular logic style
...
The availability of a simple
XOR as well of the ease of implementing some specific gate structures makes it attractive
for structures such as adders and multipliers
...
When considering CPL,
the designer should not ignore the implicit routing overhead of the complementary signals,
which is apparent in the layout of Figure 6
...

Robust and Efficient Pass-Transistor Design
Unfortunately, differential pass-transistor logic, like single-ended pass-transistor logic,
suffers from static power dissipation and reduced noise margins, since the high input to
the signal-restoring inverter only charges up to VDD-VTn
...

Solution 1: Level Restoration
...
39)
...
Assume that node X is at 0V
(out is at VDD and the Mr is turned off) with B = VDD and A = 0
...
This is, however, enough to switch the

264

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

output of the inverter low, turning on the feedback device Mr and pulling node X all the
way to VDD
...
Furthermore, no
static current path can exist through the level restorer and the pass-transistor, since the
restorer is only active when A is high
...

Level restorer

VDD
VDD
Mr

B

A

M2
Out

X

Mn

M1
Figure 6
...


While this solution is appealing in terms of eliminating static power dissipation, it
adds complexity since the circuit is now ratioed
...
The pass transistor network attempts to pull-down node X
while the level restorer pulls now X to VDD
...
Some careful
transistor sizing is necessary to make the circuit function correctly
...
When Rr is
made too small, it is impossible to bring the voltage at node X below the switching threshold of the inverter
...
This sizing problem can be reformulated as follows: the resistance
of Mn and Mr must be such that the voltage at node X drops below the threshold of the
inverter, VM = f(R1, R2)
...

VDD
VDD
B

Mr
M2

A=0

Mn

X

Out
M1

Figure 6
...


Example 6
...
One way to simplify the circuit for manual analysis is to open the feedback loop
and to ground the gate of the restoring transistor when determining the switching point (this is
a reasonable assumption, as the feedback only becomes effective once the inverter starts to

Section 6
...
Hence, Mr and Mn form a “pseudo-NMOS-like” configuration, with Mr the load transistor and Mn acting as a pull-down device to GND
...
5µm/0
...
5µm/0
...

Therefore, node X must be pulled below VDD/2 in order to switch the inverter and shut off Mr
...
42, which shows the transient response as the size of the
level restorer is varied while keeping the size of Mn fixed (0
...
25µm)
...
5µm/0
...
The detailed derivation of sizing requirement will be presented in the sequential design chapter
...
0

Voltage, V

2
...
75/0
...
50/0
...
0
W/Lr =1
...
25
0
...
25/0
...
41Transient response of the
circuit in Figure 6
...
A level restorer
that is too large can result in incorrect
evaluation
...
Adding the restoring device increases the capacitance at the internal node X, slowing down the gate
...

On the other hand, the level restorer reduces the fall time, since the PMOS transistor, once
turned on, speeds the pull-up action
...
5

Device Sizing in Pass Transistors

For the circuit shown in Figure 6
...
5µm/0
...
Determine
the maximum W/L size for the level restorer transistor for correct functionality
...
42
...
Inputs are fed to both the gate and source/drain terminals as in the case of
conventional pass transistor networks
...
42 shows a simple XOR/XNOR gate of
three variables A, B and C
...


266

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
VDD

VDD

M2

Out

M2

VDD

Complementary Output
NMOS Pass Transistor Network

M2

M1

M1

VDD

M2

Out
Out

M1

Chapter 6

M1

Out

C

C

C

C

B

B

B

B

A

Complementary inputs
to gate and source/drain
terminals
(a) general concept

A

A

A

(b) XOR/XNOR gate

Figure 6
...


Solution 2: Multiple-Threshold Transistors
...

Using zero threshold devices for the NMOS pass-transistors eliminates most of the threshold drop, and passes a signal close to VDD
...
All devices other than the pass transistors (i
...
, the inverters) are implemented using
standard high-threshold devices
...

VDD
Zero (or low)-threshold transistor
VDD
0V

2
...
5 V
Figure 6
...


The use of zero-threshold transistors can be dangerous due to the subthreshold currents that can flow through the pass-transistors, even if VGS is slightly below VT
...
43, which points out a potential sneak dc-current path
...


Section 6
...
44 CMOS transmission gate
...
The most widely-used solution to deal with the
voltage-drop problem is the use of transmission gates
...
The ideal approach is to use an
NMOS to pull-down and a PMOS to pull-up
...
44a)
...
The
transmission gate acts as a bidirectional switch controlled by the gate signal C
...
In short,
A = B

if

C=1

(6
...
Figure 6
...

Consider the case of charging node B to VDD for the transmission gate circuit in Figure 6
...
Node A is driven to VDD and transmission gate is enabled (C = 1 and C= 0)
...
However, since the PMOS device is present and turned on
(VGSp = -VDD), charging continues all the way up to VDD
...
45b shows the opposite
case, this is discharging node B to 0
...
The
PMOS transistor by itself can only pull down node B to VTp at which point it turns off
...
Though the transmission gate requires two transistors and more control
signals, it enables rail-to-rail swing
...
45 Transmission gates
enable rail-to-rail switching

Transmission gates can be used to build some complex gates very efficiently
...
46 shows an example of a simple inverting two-input multiplexer
...
27)

A complementary implementation of the gate requires eight transistors instead of six
...
46 Transmission gate multiplexer and its layout
...
47
...
To understand the operation of
this circuit, we have to analyze the B = 0 and B = 1 cases separately
...
In
the opposite case, M1 and M2 are disabled, and the transmission gate is operational, or F =
AB
...
Notice that, regardless of the
values of A and B, node F always has a connection to either VDD or GND and is hence a
low-impedance node
...
Other examples where transmission-gate logic is effectively used are fast adder circuits and registers
...
47 Transmission gate XOR
...
2

Static CMOS Design

269

Performance of Pass-Transistor and Transmission Gate Logic
The pass-transistor and the transmission gate are, unfortunately, not ideal switches, and
have a series resistance associated with it
...
48, which involves charging a node from 0 V to VDD
...
The effective resistance of the switch is modeled as a parallel
connection of the resistances Rn and Rp of the NMOS and PMOS devices, defined as (VDD
– Vout)/In and (VDD – Vout)/Ip, respectively
...
During the lowto-high transition, the pass-transistors traverse through a number of operation modes
...
28)

The resistance goes up for increasing values of Vout, and approaches infinity when Vout
reaches VDD-VTn, this is when the device shuts off
...
When Vout is small, the PMOS is saturated, but it enters the linear
mode of operation for Vout approaching VDD, giving the following approximated resistance:
V DD – V out
V DD – V out
R p = ------------------------------ = -------------------------------------------------------------------------------------------------------------------------------------IP
( V out – V DD ) 2
k p ⋅  ( – V DD – V Tp ) ( V out – V DD ) – --------------------------------- 


2

1
≈ ------------------------------------------

k p ( V DD – V Tp )

(6
...
48
...
The same is
true in other design instances (for instance, when discharging CL)
...

Problem 6
...
48)
...
Figure 6
...
Such a configuration often
occurs in circuits such as adders or deep multiplexors
...
To analyze the propagation delay of this

270

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

30
2
...
5 V

Rp
0V

10

Rn || Rp

0
0
...
0

2
...
48 Simulated equivalent resistance of transmission gate for low-to-high transition
(for (W/L)n = (W/L)p = 0
...
25µm)
...
This produces the network of Figure 6
...

The delay of a network of n transmission gates in sequence can be estimated using the
Elmore approximation (see Chapter 4):
n

t p ( V n ) = 0
...
69CR eq ------------------2

(6
...

2
...
5
V1

In

C

0

2
...
5
Vi+1

0

Vn–1

C

C

Vn
C

0

(a) A chain of transmission gates
Req
In

Req

V1
C

Req

Vi
C

(b) Equivalent RC network
Figure 6
...


Vi+1
C

Req

Vn-1

C

Vn
C

Section 6
...
13 Delay through 16 transmission gates
Consider 16 cascaded minimum-sized transmission gates, each with an average resistance of
8 kΩ
...
Since the gate inputs are assumed to be fixed, there is no Miller multiplication
...
6 fF for the low-to-high transition
...
69 ⋅ CR -------------------- = 0
...
6fF ) ( 8KΩ )  --------------------------  ≈ 2
...
31)

The transient response for this particular example is shown in Figure 6
...
The simulated delay is 2
...
It is remarkable that a simple RC model predicts the delay so accurately
...

3
...
0
Out16

1
...
0 0

2

4
6
Time (ns)

8

10

Figure 6
...


The most common approach for dealing with the long delay is to break the chain and
by inserting buffers every m switches (Figure 6
...
Assuming a propagation delay tbuf for
each buffer, the overall propagation delay of the transmission-gate/buffer network is then
computed as follows,
n
m(m + 1)
n
t p = 0
...
69 CR eq --------------------- +  --- – 1 t buf
m

2

(6
...
The optimal number of switches
mopt between buffers can be found by setting the derivative
t pbuf
m opt = 1
...
33)

272

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Obviously, the number of switches per segment grows with increasing values of tbuf
...

m

Req

Req

Req

Req

Req

Req

In
C

CC

C

C

CC

C

Figure 6
...


Example 6
...
The buffers shown in Figure 6
...
In some cases, it might be necessary to add an extra inverter to produce the correct polarity
...
5µm/0
...
5µm /0
...
(6
...
The simulated delay when placing an inverter every two transmission gates equals 154 psec, for every three transmission
gates is 154 psec, and for four transmission gates is 164 psec
...


CAUTION: Although many of the circuit styles discussed in the previous sections sound
very exciting, and might be superior to static CMOS in many respects, none of them has
the robustness and ease of design of complementary CMOS
...
For designs that have no extreme area, complexity, or speed constraints, complementary CMOS is the recommended design style
...
3

Dynamic CMOS Design
It was noted earlier that static CMOS logic with a fan-in of N requires 2N devices
...
The
pseudo-NMOS logic style requires only N + 1 transistors to implement an N input logic
gate, but unfortunately it has static power dissipation
...
With the addition of a clock input, it uses a sequence of precharge
and conditional evaluation phases
...
3
...
52a
...
The opera-

Section 6
...

VDD

VDD

CLK

Mp

CLK

Mp

Out

Out
CL

In1

A
C

PDN

In2

B

In3

CLK

Me

(a) n-type network

CLK

Me

(b) Example

Figure 6
...


Precharge
When CLK = 0, the output node Out is precharged to VDD by the PMOS transistor Mp
...
The evaluation FET eliminates any static power that would be consumed during the
precharge period (this is, static current would flow between the supplies if both the pulldown and the precharge device were turned on simultaneously)
...
The output is conditionally discharged based on the input values and the pull-down
topology
...
If the PDN is turned off, the
precharged value remains stored on the output capacitance CL, which is a combination of
junction capacitances, the wiring capacitance, and the input capacitance of the fan-out
gates
...
Consequently, once Out is discharged, it cannot be charged again till
then next precharge operation
...
Notice that the output can be in the high-impedance state during
the evaluation period if the pull-down network is turned off
...

As as an example, consider the circuit shown in Figure 6
...
During the precharge
phase (CLK=0), the output is precharged to VDD regardless of the input values since the
evaluation device is turned off
...
Otherwise, the output remains at
the precharged state of VDD
...
34)

A number of important properties can be derived for the dynamic logic gate:
• The logic function is implemented by the NMOS pull-down network
...

• The number of transistors (for complex gates) is substantially lower than in the
static case: N + 2 versus 2N
...
The sizing of the PMOS precharge device is not important for realizing proper functionality of the gate
...
There is however, a trade-off with power dissipation since a
larger precharge device directly increases clock-power dissipation
...
Ideally, no static current path ever exists between
VDD and GND
...

• The logic gates have faster switching speeds
...

The first (obvious) reason is due to the reduced load capacitance attributed to the
lower number of transistors per gate and the single-transistor load per fan-in
...

The low and high output levels VOL and VOH are easily identified as GND and VDD
and are not dependent upon the transistor sizes
...
Noise margins and switching thresholds have been
defined as static quantities that are not a function of time
...
Pure static analysis, therefore,
does not apply
...
Therefore, it is reasonable to set the switching threshold (VM) as well
as VIH and VIL of the gate equal to VTn
...

Design Consideration
It is also possible to implement dynamic logic using a complimentary approach, where the output node is connected by a pre-discharge NMOS transistor to GND, and the evaluation PUN
network is implemented in PMOS
...
During evaluation, the output is conditionally charged to VDD
...


Section 6
...
3
...
Fewer devices to implement a given logic function implies that the overall load
capacitance is much smaller
...
After the precharge phase, the output is high
...
As a result, tpLH = 0! The high-to-low transition, on
the other hand, requires the discharging of the output capacitance through the pull-down
network
...
The presence of the evaluation transistor slows the gate somewhat, as
it presents an extra series resistance
...

The above analysis is somewhat unfair, because it ignores the influence of the precharge time on the switching speed of the gate
...
During this time, the
logic in the gate cannot be utilized
...
For
instance, the precharge of the arithmetic unit in a microprocessor can coincide with the
instruction decode
...

Example 6
...
53 shows the design of a four-input NAND example designed using the dynamic-circuit style
...
As we had discussed above, we will assume
that the switching threshold of the gate equals the threshold of the NMOS pull-down transistor
...
9
...
It is assumed that all inputs
are set high as the clock transitions high
...
The resulting transient response is plotted in Figure 6
...
9
...
Making the PMOS too large should be
avoided, however, as it both slows down the gate, and increases the capacitive load on the
clock line
...

Table 6
...


Transistors

VOH

VOL

VM

NMH

NML

tpHL

tpLH

tpre

6

2
...
5-VTN

VTN

110 psec

0 nsec

83psec

As mentioned earlier, the static parameters are time-dependent
...
Figure 6
...
5

Out
In1
Voltage

Out

In2

1
...
5

In3
-0
...
5

1
Time, ns

Figure 6
...


CLK

input transitions—to 0
...
5V and 0
...
Above, we have defined the
switching threshold of the dynamic gate as the device threshold
...
The noise voltage needed to corrupt the signal has to be larger if the
evaluation time is short
...


When evaluating the power dissipation of a dynamic gate, it would appear that
dynamic logic presents a significant advantage
...
First, the
physical capacitance is lower since dynamic logic uses fewer transistors to implement a
given function
...
Second, dynamic logic gates by construction can at most have one transition per clock cycle
...
Finally, dynamic gates
do not exhibit short circuit power since the pull-up path is not turned on when the gate is
evaluating
...
0
CLK

Voltage, V

2
...
55) (VG=0
...
0

VG

0
...
0

Vout
(VG=0
...
54 Effect of an input glitch on the
output
...
A larger glitch is acceptable
if the evaluation phase is smaller
...


Section 6
...
Earlier, the transition probability for a static
gate was shown to be p0 p1 = p0 (1-p0)
...
For an n-tree dynamic gate, the output makes a 0¡1 transition during the precharge
phase only if the output was discharged during the preceding evaluate phase
...
35)

where p0 is the probability that the output is zero
...
For uniformly distributed inputs, the transition probability for an N-input gate is:
N

0
α 0 → 1 = -----2

(6
...

Example 6
...
An n-tree dynamic implementation is shown in Figure 6
...
For equi-probable inputs, there is then a 75% probability that the output node of the
dynamic gate will discharge immediately after the precharge phase, implying that the activity
2
for such a gate equals 0
...
e PNOR= 0
...
The corresponding activity is a lot
smaller, 3/16, for a static implementation
...
Though these example illustrate that the switching activity of dynamic
logic is generally higher, it should be noted that dynamic logic has lower physical capacitance
...

VDD

VDD
CLK

A

CL

B

A

B

Figure 6
...


CL
A

B

CLK

278

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Problem 6
...
Assume that the inputs are independent and pA=1 = 0
...
3, pC=1 =

0
...
4
...
3
...

However, there are several important considerations that must be taken into account if one
wants dynamic circuits to function properly
...
Some of these
issues are highlighted in this section
...
If the pull-down network is off, the output should ideally remain at the precharged state of VDD during the evaluation phase
...
Figure
6
...

VDD
CLK

CLK

(4)
Mp

(3)
Out
(1)

A=0

M1
(2)

CLK

CL

t
Vout

Precharge

Evaluate

Me
t

(a) Leakage sources

(b) Effect on waveforms

Figure 6
...


Source 1 and 2 are the reverse-biased diode and sub-threshold leakage of the
NMOS pull-down device M1, respectively
...
Charge
leakage causes a degradation in the high level (Figure 6
...
Dynamic circuits therefore
require a minimal clock rate, which is typically on the order of a few kHz
...
Note that the PMOS precharge device also contributes some leakage current

Section 6
...
To
some extent, the leakage current of the PMOS counteracts the leakage of the pull-down
path
...

Example 6
...
5µm/0
...
Assume that the input is
low during the evaluation period
...
However, as seen from Figure 6
...
Once the output drops
below the switching threshold of the fan-out logic gate, the output is interpreted as a low voltage
...
This is due to the leakage current
provided by the PMOS pull-up
...
0

Voltage, V

2
...
0

Figure 6
...
The
output settles to an intermediate voltage
determined by a resistive divider of the pulldown and pull up devices
...
00

10

20
time, ms

30

40

Leakage is caused by the high impedance state of the output node during the evaluate mode, when the pull down path is turned off
...
This is often done
by adding a bleeder transistor as shown in Figure 6
...
The only function of the
bleeder—a pseudo-NMOS-like pull-up device—is to compensate for the charge lost due
to the pull-down leakage paths
...
This allows the (strong) pull-down devices to
VDD

CLK

Mp

VDD
Mbl

CLK

Mbl

Mp

Out

Out

A

Ma

A

Ma

B

Mb

B

Mb

CLK

Me

CLK

Me

(a)

(b)

Figure 6
...


280

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

lower the Out node substantially below the switching threshold of the inverter
...
58b)
...
Consider the
circuit of Figure 6
...
During the precharge phase, the output node is precharged to VDD
...
Assume further that input B remains at 0 during evaluation, while input A makes
a 0 → 1 transition, turning transistor Ma on
...
This causes a drop in the output voltage, which cannot be
recovered due to the dynamic nature of the circuit
...
59 Charge sharing in dynamic networks
...
Under the above assumptions, the following initial conditions are valid: Vout(t = 0) = VDD and VX(t = 0) = 0
...
∆Vout < VTn — In this case, the final value of VX equals VDD – VTn(VX)
...
37)

Ca
= – ----- [ V DD – V Tn ( V X ) ]
CL

2
...
38)

Which of the above scenarios is valid is determined by the capacitance ratio
...

(6
...
3

Dynamic CMOS Design

281

V Tn
Ca
----- = -----------------------V DD – V Tn
CL

(6
...
The output of the dynamic
gate might be connected to a static inverter, in which case the low level of Vout would
cause static power consumption
...

Example 6
...
60,
which implements a 3-input EXOR function y = A ⊕ B ⊕ C
...
For simplicity, ignore the
load inverter, and assume that all inputs are low during the precharge operation and that all
isolated internal nodes (Va, Vb, Vc, and Vd) are initially at 0V
...
The worst-case change in output is obtained by exposing the
maximum amount of internal capacitance to the output node during the evaluation period
...
The voltage change can then be obtained by equating the
initial charge with the final charge as done with equation Eq
...
38), yielding a worst-case
change of 30/(30+50) * 2
...
94V
...
5- 0
...
56V
...
61
...
This solution obviously comes at the cost of increased area and capacitance
...
A wire routed over a dynamic node may couple capacitively and destroy the state
of the floating node
...
Consider the circuit shown in Figure 6
...
A transition in the input In of
CLK

y

Cy = 50 fF

A
A

a
Ca = 15 fF

VDD = 2
...
60 Example illustrating the
charge sharing effect in dynamic logic
...
61 Dealing with charge-sharing by precharging
internal nodes
...


the static gate may cause the output of the gate (Out2) to go low
...
A simulation of this effect is
shown in Figure 6
...
This further causes the output of the static NAND gate not to drop all the way
down to 0V, and a small amount of static power is dissipated
...
When
designing and laying out dynamic circuits, special care is needed to minimize capacitive
coupling
...
62 Example demonstrating
the effect of backgate coupling
...
The coupling capacitance consists of the gate-to-drain capacitance of the precharge
device, and includes both the overlap and the channel capacitances
...
Subsequently, the fast
rising and falling edges of the clock couple onto the signal node, as is quite apparent in the
simulation of Figure 6
...

The danger of clock feedthrough is that it may cause the (normally reverse-biased)
junction diodes of the precharge transistor to become forward-biased
...
3

Dynamic CMOS Design
due to clock feedthrough

3
...
0
Voltage, V

283

CLK

1
...
0

-1
...
63 Clock feedthrough effect
...
CMOS latchup might be another result
of this injection
...

All the above considerations demonstrate that the design of dynamic circuits is
rather tricky and requires extreme care
...

6
...
4

Cascading Dynamic Gates

Besides the signal integrity issues, there is one major catch that complicates the design of
dynamic circuits: straightforward cascading of dynamic gates to create more complex
structures does not work
...
64a
...
e
...
Assume that the primary input In makes a
0 → 1 transition (Figure 6
...
On the rising edge of the clock, output Out1 starts to discharge
...
64 Cascade of dynamic n−type blocks
...
However, there is a finite propagation
delay for the input to discharge Out1 to GND
...
As long as Out1 exceeds the switching threshold of the second gate, which
approximately equals VTn, a conducting path exists between Out2 and GND, and precious
charge is lost at Out2
...
This leaves Out2 at an intermediate voltage
level
...
The charge loss leads to reduced
noise margins and potential malfunctioning
...
This may cause inadvertent discharge in the
beginning of the evaluation cycle
...
When doing so, all transistors in the pull-down network are turned off after
precharge, and no inadvertent discharging of the storage capacitors can occur during evaluation
...
Transistors are only be
turned on when needed, and at most once per cycle
...
The two most important ones are discussed below
...
A Domino logic module [Krambeck82] consists of an n-type dynamic logic
block followed by a static inverter (Figure 6
...
During precharge, the output of the ntype dynamic gate is charged up to VDD, and the output of the inverter is set to 0
...
If one assumes that all the inputs of a Domino
gate are outputs of other Domino gates3, then it is ensured that all inputs are set to 0 at the
end of the precharge phase, and that the only transitions during evaluation are 0 → 1 transitions
...
The introduction of the static inverter has the
additional advantage that the fan-out of the gate is driven by a static inverter with a lowimpedance output, which increases noise immunity
...

Consider now the operation of a chain of Domino gates
...
During evaluation, the output of the first Domino block either stays at 0 or
makes a 0 → 1 transition, affecting the second gate
...

Domino CMOS has the following properties:
• Since each dynamic gate has a static inverter, only non-inverting logic can be implemented
...

2
3

This ignores the impact of charge distribution and leakage effects, discussed earlier
...


Section 6
...
65 DOMINO CMOS logic
...
The inverter can be sized to match the fan-out, which is already much smaller
than in the complimentary static CMOS case, as only a single gate capacitance has
to be accounted for per fan-out gate
...
However, eliminating the evaluation device extends the precharge cycle: the precharge now has to ripple through the logic network as well
...
66, where the evaluation devices have been eliminated
...
On the falling edge of the clock, the precharge operation is started
...
The input to the second gate is initially high, and it takes two gate delays before In2 is driven low
...
Similarly, the third gate has to wait till the second gate precharges before it can
start precharging, etc
...
Another important negative is the extra power dissipation when both pull-up
and pull-down devices are on
...

Dealing with the Non-inverting Property of Domino Logic
...
This requirement has
VDD

CLK

Mp

VDD

CLK

VDD

Mp

CLK

Mp

Out1

1->0

Outn

0->1

In1

Out2
0->1

0->1

In2
1->0

In3

Inn

1->0

1->0

Figure 6
...
The
circuit also exhibits static power dissipation
...
There are several ways to deal with the
non-inverting logic requirement
...
67 shows one approach to the problem—reorganizing the logic using simple boolean transforms such as De Morgan’s Law
...

Domino AND

A
B

X

C
D
E

Y

F
G
H

A
B

X

C
D
E
F
G
H

Y

Domino AND-OR
Domino OR
(a) before logic transformation

(b) after logic transformation

Figure 6
...


A general but expensive approach to solving the problem is the use of differential
logic
...
Figure 6
...
Note that all inputs
come from other differential Domino gates, and are low during the precharge phase, while
making a conditional 0→1 transition during evaluation
...
This comes at the expense of an increased
power dissipation, since a transition is guaranteed every single clock cycle regardless of
the input values—either O or O must make a 0→1 transition
...
Notice that this circuit is not ratioed, even in the presence of the PMOS pull-up
devices! Due to its high-performance, this differential approach is very popular, and is
used in several commercial microprocessors
...
Several optimizations can be performed on
Domino logic gates
...
68 Simple dual rail (differential)
Domino logic gate
...
3

Dynamic CMOS Design

287

the transistors in the static inverter
...
The critical path during evaluation goes through the pull-down path of the
dynamic gate, and the PMOS pull-up transistor of the static inverter
...
This can be accomplished by using a small
(minimum) sized NMOS and a large PMOS device
...
The only disadvantage of using a large beta ratio is a reduction in noise margin
...

Numerous variations of Domino
VDD
logic have been proposed [Bernstein98]
...
The basic
concept is illustrated is Figure 6
...
It
CLK
A
exploits the fact that certain outputs are
O2= B(C+D) = B O3
CLK
subsets of other outputs to generate a
B
number of logical functions in a single
O3= C+D
gate
...
Since O2 equals B ·O3, it can
reuse the logic for O3
...
69 Multiple output Domino
internal nodes have to be precharged to
VDD to produce the correct results
...
However, the number of evaluation transistors is drastically reduced as
they are amortized over multiple outputs
...

Compound Domino (Figure 6
...
Instead of each dynamic
gate driving a static inverter, it is possible to combine the outputs of multiple dynamic
gates with the aid of a complex static CMOS gate, as shown in Figure 6
...
The outputs of
three dynamic structures, implementing O1 = A B C, O2 = D E F and O3 = G H, are combined using a single complex CMOS static gate that implements O = (o1+o2) o3
...

Compound Domino is a useful tool for constructing complex dynamic logic gates
...
For example, a large fan-in Domino AND can be implemented as parallel
dynamic NAND structures with lower fan-in that are combined using a static NOR gate
...
70 Compound Domino logic uses
complex static gates at the output of the
dynamic gates
...
Care must be taken to ensure that the dynamic nodes are not affected by the
coupling between the output of the static gates and the output of dynamic nodes
...
np-CMOS, provides an alternate approach to cascading dynamic logic by using two
flavors (n-tree and p-tree) of dynamic logic
...
71)
([Goncalvez83, Friedman84, Lee86])
...
The output conditionally makes a 0 → 1 transition during evaluation depending on its inputs
...
71 The np-CMOS logic
circuit style
...
If the n-tree gates are controlled by CLK, and p-tree gates are
controlled using CLK, n-tree gates can directly drive p-tree gates, and vice-versa
...
During the precharge phase (CLK = 0), the output of the n-tree gate, Out1, is charged

Section 6
...
Since the n-tree
gate connects PMOS pull-up devices, the PUN of the p-tree is turned off at that time
...
This ensures that no accidental discharge of Out2
can occur
...
A disadvantage of the np-CMOS logic style is that
the p-tree blocks are slower than the n-tree modules, due to the lower current drive of the
PMOS transistors in the logic network
...


6
...
4
...
Each of the circuit styles has its advantages and disadvantages
...
No single style optimizes all these measures at the
same time
...

The static approach has the advantage of being robust in the presence of noise
...

This ease-of-design does not come for free: for complex gates with a large fan-in, complementary CMOS becomes expensive in terms of area and performance
...
Pseudo-NMOS is simple and fast at the expense
of a reduced noise margin and static power dissipation
...

Dynamic logic, on the other hand, makes it possible to implement fast and small
complex gates
...
Parasitic effects such as charge sharing make the
design process a precarious job
...

The current trend is towards an increased use of complementary static CMOS
...
These tools emphasize optimization at the logic rather than the circuit level and put
a premium on robustness
...

6
...
2

Designing Logic for Reduced Supply Voltages

In Chapter 3, we projected that the supply voltage for CMOS processes will continue to
drop over the coming decade, and may go as low as 0
...
To maintain performance under those conditions, it is essential that the device thresholds scale as well
...
72a shows a plot of the (VT, VDD) ratio required to maintain a given performance
level (assuming that other device characteristics remain identical)
...
Reducing the threshold voltage, increases the
subthreshold leakage current exponentially as we derived in Eq
...
40) (repeated here for
the sake of clarity)
...
5
1
...
75

tpd=645pS

ID, A

tpd=420pS

1
...
5
0
...
0
0
...
15

0
...
35
VT, V
(a) VDD/VT for fixed performance

10-2
10-3
VT = 0
...
4 V
10-6
-7
10
10-8
10-9
10-10
10-11
10-12
0 0
...
2 0
...
4 0
...
6 0
...
8 0
...
0
VGS, V
(b) Leakage as a function of VT

0
...
72 Voltage Scaling (VDD/VT on delay and leakage)

Ileakage = I S 10

V GS – V Th
----------------------- 
S

 1 – 10


nV D S
– ------------
S




(6
...
The subthreshold leakage of an inverter is the current
of the NMOS for Vin = 0V and Vout = VDD (or the PMOS current for Vin = VDD and Vout =
0)
...
72b
...
For example, the processor in a cellular phone remains in idle mode for a majority of the time
...

This is only possible if leakage is low—this is, the devices have a high threshold voltage
...
To satisfy the contradicting
requirements of high-performance during active periods, and low leakage during standby,
several process modifications or leakage-control techniques have been introduced in
CMOS processes
...
18 µm CMOS support
devices with different thresholds—typically a device with low threshold for high performance circuits, and a transistor with high threshold for leakage control
...
To use this approach for the control of individual devices requires a dual-well process (see Figure 2
...

Clever circuit design can also help to reduce the leakage current, which is a function
of the circuit topology and the value of the inputs applied to the gate
...
In an inverter with In = 0, the sub-threshold

Section 6
...
In more
complex CMOS gates, the leakage current depends upon the input vector
...

Under these conditions, the intermediate node X settles to,
V X ≈ V th ln ( 1 + n )

(6
...
Clearly, the sub-threshold leakage under this condition is slightly smaller
than that of the inverter
...
Figure 6
...

VDD
A

P1

B

P2
G

A

B

VX

ISUB

0
0
1
1

0
1
0
1

Vth ln (1+n)
0
Vdd-VT
0

INSUB (VGS = VBS = -VX)
INSUB (VGS = VBS = 0)
INSUB (VGS = VBS = 0)
2 IPSUB (VSG = VSB = 0)

N1
VX

B

A

N2

Figure 6
...


In short-channel MOS transistors, the sub-threshold leakage current depends not
only on the gate drive (VGS) and the body bias (VBS), but also depends on the drain voltage
(VDS)
...
Typical value for DIBL can range
from 20-150 mV change in VT per volt change in VDS
...
74 illustrates the impact on
the sub-threshold leakage as a result of
• a decrease in gate drive—point A to B
• an increase in body bias—point A to C
• an increase in drain voltage—point A to D
...
The intermediate voltage reduces the drain-source voltage of the
top-most device, and hence reduces its leakage
...
73, when both M1and M2 are off
...
75, we see that VX settles to approximately 100 mV in steady state
...

In summary, the sub-threshold leakage in complex stacked circuits can be significantly lower than in individual devices
...
74 Dependence of subthreshold leakage current on terminal
voltages for a typical 0
...
75 Load line indicating the
steady state solution for the intermediate
node voltage in a transistor stack
...
Exploiting this effect requires a careful selection of the input
signals to every gate during standby or sleep mode
...
8 Computing VX
Eq
...
41) represents intermediate node voltage for a two-input NAND with less than 10%
error, when A = B = 0
...
(6
...
5
...
5)
...
5

Summary
In this chapter, we have extensively analyzed the behavior and performance of combinational CMOS digital circuits with regard to area, speed, and power
...
6

To Probe Further

293

• Static complementary CMOS combines dual pull-down and pull-up networks, only
one of which is enabled at any time
...
Techniques to
deal with fan-in include transistor sizing, input reordering, and partitioning
...
Extra buffering is needed for large fanouts
...
This results in a substantial reduction in gate complexity at the expense
of static power consumption and an asymmetrical response
...
The most popular approaches in
this class are the pseudo-NMOS techniques and differential DCVSL, which requires
complementary signals
...
This
results in very simple implementations for some logic functions
...
NMOS-only pass-transistor logic produces even
simpler structures, but might suffer from static power consumption and reduced
noise margins
...

• The operation of dynamic logic is based on the storage of charge on a capacitive
node and the conditional discharging of that node as a function of the inputs
...
Dynamic logic trades off noise margin for performance
...
Cascading dynamic gates can cause problems, and should be addressed carefully
...
This activity is a function of the input statistics, the network
topology, and the logic style
...

• Threshold voltage scaling is required for low-voltage operation
...
6

To Probe Further
The topic of (C)MOS logic styles is treated extensively in the literature
...
Some of the most comprehensive treatments can be found
in [Weste93] and [Chandrakasan01]
...
The topic of power minimization is relatively new
...


294

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

Chapter 6

Innovations in the MOS logic area are typically published in the proceedings of the
ISSCC Conference and the VLSI circuits symposium, as well as the IEEE Journal of Solid
State Circuits (especially the November issue)
...
Bernstein et al
...

[Chandrakasan95] A
...
Brodersen, Low Power Digital CMOS Design, Kluwer
Academic Publishers, 1995
...
Chandrakasan, W
...
Fox, ed
...

[Friedman84] V
...
Liu, “Dynamic Logic CMOS Circuits,” IEEE Journal of Solid
State Circuits, vol
...
2, pp
...

[Goncalvez83] N
...
De Man, “NORA: A Racefree Dynamic CMOS Technique for
Pipelined Logic Structures,” IEEE Journal of Solid State Circuits, vol
...
3, pp
...

[Heller84] L
...
, “Cascade Voltage Switch Logic: A Differential CMOS Logic Family,”
Proc
...
16–17, February 1984
...
Krambeck et al
...
SC-17, no
...
614–619, June 1982
...
M
...
Szeto, “Zipper CMOS,” IEEE Circuits and Systems Magazine, pp
...

[Parameswar96] A
...
Hara, and T
...
SC-31, no
...
805-809, June 1996
...
Rabaey and M
...
, Low Power Design Methodologies, Kluwer, 1995
...
Radhakrishnan, S
...
Maki, “Formal Design Procedures for
Pass-Transistor Switching Circuits,” IEEE Journal of Solid State Circuits, vol
...
2,
pp
...

[Shoji88] M
...

[Shoji96] M
...

[Sutherland99] I
...
Sproull, and D
...

[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
Yano et al
...
8 ns CMOS 16 × 16 b Multiplier Using Complimentary PassTransistor Logic,” IEEE Journal of Solid State Circuits, vol
...
388–395, April
1990
...
Ye, S
...
De, “A new technique for standby leakage reduction in high-performance circuits,” Symposium on VLSI Circuits, pp 40-41, 1998
...
6

To Probe Further

295

chapter7
...
1

Introduction

7
...
3

Classification of Memory Elements

7
...
5
...
5
...
5
...
4
...
6 Pulse Registers
7
...
2SR Flip-Flops

6
...
2

The C2MOS Latch

7
...
3Multiplexer Based Latches

7
...
2

7
...
4Master-Slave Based Edge Triggered
Register

NORA-CMOS— A Logic Style for
Pipelined Structures

7
...
3

True Single-Phase Clocked Register
(TSPCR)

7
...
5Non-ideal clock signals
7
...
6Low-Voltage Static Latches
7
...
7 Sense-Amplifier Based Registers
7
...
fm Page 271 Tuesday, April 18, 2000 8:52 PM

Section
7
...
1Latch- vs
...
8
...
9

Non-Bistable Sequential Circuits
7
...
1The Schmitt Trigger
7
...
2Monostable Sequential Circuits
7
...
3Astable Circuits

7
...
11 Summary
7
...
13 Exercises and Design Problems

271

chapter7
...
1

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

Introduction
Combinational logic circuits that were described earlier have the property that the output
of a logic block is only a function of the current input values, assuming that enough time
has elapsed for the logic gates to settle
...
In
these circuits, the output not only depends upon the current values of the inputs, but also
upon preceding input values
...

Figure 7
...
The system depicted
here belongs to the class of synchronous sequential systems, in which all registers are
under control of a single global clock
...
The Next State is determined based on the Current State and
the current Inputs and is fed to the inputs of registers
...
The register then ignores changes in the input signals until the
next rising edge
...

)
Inputs

Outputs
COMBINATIONAL
LOGIC

Current State
Registers
Q

Next State

D
CLK

Figure 7
...


This chapter discusses the CMOS implementation of the most important sequential
building blocks
...

Before embarking on a detailed discussion on the various design options, a revision of the
design metrics, and a classification of the sequential elements is necessary
...
2

Timing Metrics for Sequential Circuits
There are three important timing parameters associated with a register as illustrated in Figure 7
...
The set-up time (tsu) is the time that the data inputs ( input) must be valid before
D
the clock transition (this is, the 0 to 1 transition for apositive edge-triggered register)
...
Assum-

chapter7
...
3

Classification of Memory Elements

273

CLK
t
tsu
D

Register
D

thold

DATA
STABLE

Q

CLK
t

tc-q
Q

DATA
STABLE

Figure 7
...


ing that the set-up and hold-times are met, the data at theD input is copied to the Q output
after a worst-case propagation delay (with reference to the clock edge) denoted bytc-q
...
Assume that the worst-casepropagation
delay of the logic equals tplogic, while its minimum delay (also called thecontamination
delay) is tcd
...
1)

The hold time of the register imposes an extra c
onstraint for proper operation,
t cd register + t cd logic ≥ t hold

(7
...

As seen from Eq
...
1), it is important to minimize thevalues of the timing parameters associated with the register, as these directly affect the rate at whicha sequential circuit can be clocked
...
For example,the DEC Alpha EV6 microprocessor
[Gieseke97] has a maximum logic depth of 12 gates, and the register overhead stands for
approximately 15% of the clock period
...
(7
...


7
...
Memory
that is embedded into logic is foreground memory, and most often organized as individis
ual registers of register banks
...
Background memory, discussed later in this book, achieves

chapter7
...
In this chapter, we focus on foreground memories
...
Static memories preserve the state as long as the
power is turned on
...
Static memories are most useful when the register won’
t
be updated for extended periods of time An example of such is configuration data, loaded

...
This condition also holds for most processors that use conditional clocking (i
...
, gated clocks) where the clock is turnedoff for unused modules
...
Memory based on positive feedback fall under
the class of elements called multivibrator circuits
...

Dynamic memories store state for a short period of time o the order of millisec— n
onds
...
As with dynamic logic discussed earlier, the capacitors
have to be refreshed periodically to annihilate charge leakage
...
They
are most useful in datapath circuits that require high performance levels and are periodically clocked
...

Latches vs
...
It is
level-sensitive circuit that passes the D input to the Q output when the clock signal is high

...
When the clock is low, the input data sampled
on the falling edge of the clock is held stable at the output for the entire phase, and the
latch is in hold mode
...
A latch operating under the above conditions is a positive latch
...

shown in Figure 7
...
A wide variety of static and dynamic implementations exists for the
realization of latches
...
They are typically built using the latch primitivesof Figure 7
...
A
most-often recurring configuration is themaster-slave structure that cascades a positive
and negative latch
...
Examples of these
are shown later in this chapter
...
fm Page 275 Tuesday, April 18, 2000 8:52 PM

Section 7
...
3 Timing of positive and negative latches
...
4 Static Latches and Registers
7
...
1

The Bistability Principle

Vi1

Vo1 = Vi2

Vi2 = Vo1

Vo1

Static memories use positive feedback to create a bistable circuit — a circuit having two
stable states that represent 0 and 1
...
4a which shows
,
two inverters connected in cascade along with a voltage-transfer characteristic typical of
such a circuit
...
The latter plot is rotated to accentuate that Vi2 = Vo1
...
4a
...
4 Two cascaded inverters (a)
and their VTCs (b)
...
fm Page 276 Tuesday, April 18, 2000 8:52 PM

276

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

ble operation points (A, B, and C), as demonstrated on the combined VTC
...


Suppose that the cross-coupled inverter pair is biased at point C
...
This is a consequence of the gain around the loop being larger than 1
...
5a
...
This deviation is amplified by the gain of the inverter
...
The bias point moves away from until one of
C
the operation points A or B is reached
...

Every deviation (even the smallest one) causes the operation point to run away from its
original bias
...
Operation points with this property are termedmetastable
...
5

Vi1 = Vo2

(a)
Metastability
...
5b
...
Even a rather large deviation from the operation point is reduced in size and disappears
...
The circuit serves as a
memory, storing either a 1 or a 0 (corresponding to positionsA and B)
...
Since the precondition for stability is that the loop gain is smaller
G
than unity, we can achieve this by making (or B) temporarily unstable by increasingG to
A
a value larger than 1
...
For
V
instance, assume that the system is in positionA (Vi1 = 0, Vi2 = 1)
...
The positive feedback regenerates the effect of the trigger pulse, and the circuit
moves to the other state (B in this case)
...
fm Page 277 Tuesday, April 18, 2000 8:52 PM

Section 7
...

In summary, a bistable circuit has two stable states
...
A trigger pulse must be applied to change the state
of the circuit
...

7
...
2

SR Flip-Flops

The cross-coupled inverter pair shown in the previous section provides an approach to
store a binary variable in a stable way
...
The simplestincarnation accomplishing this i the wells
know SR — or set-reset— lip-flop, an implementation of which is shown in Figure 7
...

f
This circuit is similar to the cross-coupled inverter pair with NOR gates replacing the
inverters
...
These outputs are complimentary (except for theSR = 11 state)
...
If a positive
(or 1) pulse is applied to the S input, the Q output is forced into the 1 state (withQ going to
0)
...

S

R

Q

Q

0

0

Q

Q

1

S

0

1

0

0
1

1
1

0
0

1
0

Q
S

Q

R

Q

Q

R

Forbidden State
(a) Schematic diagram
Figure 7
...


These results are summarized in the characteristic table of the flip-flop, shown in
Figure 7
...
The characteristic table is the truth table of the gate and lists the output states
as functions of all possible input conditions
...
Since this does not correspond with our constraint that and Q must be
Q
complementary, this input mode is considered to be forbidden
...

Finally,
Figure 7
...


chapter7
...
1 SR Flip-Flop Using NAND Gates
An SR flip-flop can also be implemented using a
cross-coupled NAND structure as shown in Figure
7
...
Derive the truth table for a such an implementation
...
7 NAND-based SR flip-flop
...
Most systems operate in a synchronous fashionwith transition events referenced to a
clock
...
8
...
Observe that the number of transistors is identical to the implementation of Figure
7
...
The drawbackof saving some
transistors over a fully-complimentary CMOS implementation si that transistor sizing
becomes critical in ensuring proper functionality
...
The combination of transistors M4, M7, and M8 forms a ratioed
inverter
...
Once this is achieved, the positive feedback
causes the flip-flop to invert states
...

VDD

M2

M4
Q

Q
CLK

S

M6

M1

M5

M8

CLK

M7

M3

R
Figure 7
...


The presented flip-flop does not consume any static power
...
No static paths between VDD
and GND can exist except during switching
...
1

Transistor Sizing of Clocked SR Latch

Assume that the cross-coupled inverter pair is designed such that the inverter thresholdVM is
located at VDD/2
...
25 µm CMOS technology, the following transistor sizes were
selected: (W/L)M1 = (W/L)M3 = (0
...
25µm), and (W/L)M2 = (W/L)M4 = (1
...
25µm)
...


chapter7
...
4

Static Latches and Registers

279

To switch the latch from theQ = 0 to the Q = 1 state, it is essential that the low level of
the ratioed, pseudo-NMOS inverter (M 5-M6)-M 2 be below the switching threshold of the
inverter M3-M 4 that equals VDD/2
...
The boundary conditions on the transistor
sizes can be derived by equating the currents in the inverter forVQ = VDD / 2, as given in Eq
...
3) (this ignores channel length modulation)
...
5V and VM = 1
...
We assume that M5 and M6 have identical
sizes and that W/L5-6 is the effective ratio of the series connected devices
...

W
k′n  ----
 L

2
2
V DSATn
V DSATp
 ( V – V )V
 W 
Tn
DSATn – ----------------- = k′ p  ----  ( – V DD – V Tp )V DSATp – -----------------
 DD
2
L 2
2
5–6

(7
...
25 µm process, Eq
...
3) results in the constraint that the
effective (W/L)M5-6 ≥ 2
...
This implies that the individual device ratio forM5 or M6 must be
larger that approximately 4
...
Figure 7
...
We notice that the individual device ratio of greater than 3
is sufficient to bring theQ voltage to the inverter switching threshold
...
Figure 7
...
The plot confirms that an individualW/L ratio of greater than 3 is required to overpower
the feedback and switch the state of the latch

...
0

3
Q

S

1
...
5µm
W=0
...
7µm

Volts

Q (Volts)

2
1
...
8µm

0
...
9µm
W=1µm
0
...
0

2
...
0
W/L5and 6

(a)

3
...
0

0 0 0
...
4 0
...
8 1 1
...
4 1
...
8 2
time (nsec)

(b)

Figure 7
...
(a) DC output voltage vs
...
5µm/
...
(b) Transient response shows thatM5
and M6 must each have aW/L larger than 3 to swtich theSR flip-flop
...
Some simplifications are therefore necessary
...
8, whereQ and Q are set to 0 and 1, respectively
...
In the first phase of the transient, node is being
Q
pulled down by transistors M5 and M6
...
The transient response is hence determined by the pseudo-NMOS
inverter formed by (M5-M6) and M2
...
This accelerates the pulling down of nodeQ
...
fm Page 280 Tuesday, April 18, 2000 8:52 PM

280

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

derive that the propagation delay of node Q is approximately equal to the delay of the
pseudo-NMOS inverter formed by (M5-M6) and M2
...

M
Example 7
...
8, as obtained from simulation, is plotted in
Figure 7
...
The devices are sized as described in Example 7
...
The flip-flop is initially in the reset state, and an
S-pulse is
applied
...
Once the switching threshold of the inverterM3-M4 is reached, the Q output starts to rise
...
From the simulation results, we can derive thatpQ and tpQ equal
t
120psec and 230psec, respectively
...
0

2
...
0
Figure 7
...

0
...
9

1
...
1

1
...
3

1
...
5

Problem 7
...
8, it is also possible to use complementary
logic to implement the clocked SR FF
...
This circuit is more complex, but switches faster and consumes less switching
power
...


7
...
3

Multiplexer Based Latches

There are many approaches for constructing latches
...
Multiplexer based latches can provide
similar functionality to theSR latch, but has the important added advantage that the sizing
of devices only affects performance and is not critical to the functionality
...
11 shows an implementation of static positive and negative latches based
on multiplexers
...
fm Page 281 Tuesday, April 18, 2000 8:52 PM

Section 7
...
When the clock signal is high,
the input 1 of the multiplexer which connects to the output of the latch, is selected
...
Similarly in the positive
ile
latch, the D input is selected when clock is high and the output is held (using feedback)
,
when clock is low
...
11 Negative and positive latches based on
multiplexers
...
12
...
During this phase, the feedback
loop is open since the top transmission gate isoff
...
The number of transistors that the clock touches is important since it has an activity factor of 1
...

CLK

Q
CLK
D

Figure 7
...

CLK

It is possible to reduce the clock load to two transistors by using implement multiplexers using NMOS only pass transistor as shown in Figure 7
...

approach is the reduced clock load of only two NMOS devices
...
While attractive for its simplicity, the use ofNMOS only pass
transistors results in the passing of a degraded high voltage ofVDD-VTn to the input of the
first inverter
...
It also causes static power dissipation in first inverter, as already pointed out in Chapter 6
...


chapter7
...
13 Multiplexer based NMOS latch using NMOS only pass transistors for multiplexers
...
4
...
14
...
A multiplexer based latch
is used in this particular implementation, though any latch can be used to realize the master and slave stages
...
During this period, the slave stage is in
the hold mode, keeping its previous value using feedback
...
During the
high phase of the clock, the slave stage samples the output of the master stageQM), while
(
the master stage remains in a hold mode
...
The value ofQ is the value of D
right before the rising edge of the clock, achieving thepositive edge-triggered effect
...
e
...

Slave
CLK

Master
0
1

1

Q

D
QM

QM
D

0

Q
CLK

CLK
Figure 7
...


A complete transistor level implementation of a themaster-slave positive edge-triggered register is shown in Figure 7
...
The multiplexer is implemented using transmission
gates as discussed in the previous section
...
During this period, T3 is off and T4 is on and
the cross-coupled inverters (I5, I6) holds the state of the slave latch
...
T1 is off and T2

chapter7
...
4

Static Latches and Registers

283

is on, and the cross coupled inverters I3 and I4 holds the state of QM
...


I2

D

T2

I1

I5

T1

I3

T4

I4

I6

Q

T3

QM

CLK
Figure 7
...


Problem 7
...
3 without loss of functionality
...
As discussed earlier, there are three important timing metrics in registers: theset up time, the hold time and
the propagation delay
...
Assume that the
propagation delay of each inverter is tpd_inv and the propagation delay of the transmission
gate is tpd_tx
...

The set-up time is the time before the rising edge of the clock that the input data
D
must become valid
...
For the transmission gate multiplexer-based register, the input D has to propagate through I1, T1, I3 and
I2 before the rising edge of the clock
...
Otherwise, it is possible for the
cross-coupled pair I2 and I3 to settle to an incorrect value
...

The propagation delay is the time for the value of QM to propagate to the output Q
...
Therefore the delaytc-q is simply the delay throughT3 and
I6 (tc-q = tpd_tx + tpd_inv)
...
In this case, the transmission gate 1 turns off when clock goes high and
T
therefore any changes in the D-input after clock going high are not see by the input
...


chapter7
...
3 Timing analysis using SPICE
...
Figure 7
...
For the 210 psec case, the correct value of inputD
is sampled (in this case, the Q output remains at the value ofVDD)
...
Node QM
starts to go high while the output ofI2 (the input to transmission gate T2) starts to fall
...
The
set-up
time for this register is therefore 210 psec
...
0

3
...
5

2
...
5

CLK

D

D

CLK

Q

1
...
0

QM

I2-T2

0
...
5
0
...
0
-0
...
5

Volts

Volts

2
...
5

-0
...
2

0
...
6
time(ns)
(a) Tsetup = 0
...
8

1

0

0
...
4
0
...
20ns

0
...
16 Set-up time simulation
...
The D input edge is once again
skewed relative to the clock signal till the circuit stop functioning
...
e
...
Finally, for the
propagation
delay, the inputs are transitioned at least a set-up time before the rising edge of the clock and
the delay is measured from the 50% point of theCLK edge to the 50% point of the Q output
...
17), tc-q (lh) was 160psec and tc-q(hl) was 180psec
...
5
CLK

Volts

1
...
17 Simulation of propagation delay
...
5

-0
...
5

1

time, ns

1
...
5

chapter7
...
4

Static Latches and Registers

285

As mentioned earlier, the drawback of the transmission gate register is the high capacitive
load presented to the clock signal
...
Ignoring the overhead required to
invert the clock signal (since the buffer inverter overhead can be amortized over multiple
register bits), each register has a clock load of 8 transistors
...
Figure 7
...

CLK

CLK

D

I1

T1

I2

CLK

I3

T2

Q

I4

CLK

Figure 7
...


The penalty for the reduced clock load is increased design complexity
...
The sizing requirements for the transmission gate
s
can be derived using a similar analysis as performed for theSR flip-flop
...
If
minimum-sized devices are to be used in the transmission gates, it is essential that th
e
transistors of inverter I2 should be made even weaker
...
Using minimum or close-to-minimumsize devices in the transmission gates is desirable to reduce the power dissipation in the
latches and the clock distribution network
...
When the slave stage ison (Figure 7
...
As long
as I4 is a weak device, this is fortunately not a majorproblem
...
19 Reverse conduction possible in the transmission gate
...
4
...
Even if this were possible, this would still not be a
good assumption
...
fm Page 286 Tuesday, April 18, 2000 8:52 PM

286

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

the load capacitances can vary based on data stored in theconnecting latches
...
20b
...
20a
...
20 Master-slave register based
on NMOS-only pass transistors
...
However, since CLK and CLK are both high for a
short period of time (the overlap period), both sampling pass transistors conduct and
there is a direct path from the D input to the Q output
...
The is know as a race condition in which the value of the output
Q is a function of whether the input D arrives at node X before or after the falling
edge of CLK
...

• The primary advantage of the multiplexerbased register is that the feedback loop is
open during the sampling period, and therefore sizing of devices is not critical to
functionality
...

Those problems can be avoided by using twonon-overlapping clocks PHI1 and PHI2
instead (Figure 7
...

During the nonoverlap time, the FF is in the high-impedance state— the feedback loop is
open, the loop gain is zero, and the input is disconnected
...
Hence the namepseudostatic: the register
employs a combination of static and dynamic storage approaches depending upon the state
of the clock
...
fm Page 287 Tuesday, April 18, 2000 8:52 PM

Section 7
...
21 Pseudostatic two-phaseD register
...
4 Generating non-overlapping clocks
Figure 7
...
Assuming that each gate has a unit gate delay, derive
the timing relationship between the input clock and the two output clocks
...
4
...
22Circuitry for generating a
two phase non-overlapping clock

Low-Voltage Static Latches

The scaling of supply voltages is critical for low power operation
...
For example, without
t
the scaling of device thresholds, NMOS only pass transistors (e
...
, Figure 7
...
At very low power supply voltages, the input to the inverter cannot be raised above the switching threshold,
resulting in incorrect evaluation
...

Scaling to low supply voltages hence requires the use of reduced threshold devices
...
When the registers are constantly accessed, the leak-

chapter7
...
However, with the
use of conditional clocks, it is possible that registers are idle for extended periods and the
leakage energy expended by registers can be quite significant
...
One approach for this involves the use of Mu
ltiple Threshold devices as
shown in Figure 7
...
Only the negative latch is shown here
...
The lowthreshold inverters are gated using high threshold devices to eliminate leakage
...
When clock is
low, the D input is sampled and propagates to the output
...
The feedback transmission gate conducts and the cross-coupled feedback is enabled
...
During idle mode, the high threshold devices in series with the low
threshold inverter are turned off (the SLEEP signal is high), eliminating leakage
...
The feedback
low-threshold transmission gate is turnedon and the cross-coupled high-threshold devices
maintains the state of the latch
...
23 One solution for the leakage problem in low-voltage operation using MTCMOS
...
5Transistor minimization in the MTCMOS register

Unlike combination logic, both flavors of high threshold devices in series are required to
eliminate the leakage of low threshold gates
...
Hint: Eliminate
the high VT NMOS or high VT PMOS of the low threshold inverter on the right of Figure
7
...


chapter7
...
5

Dynamic Latches and Registers

289

7
...
This
approach has the useful property that a stored value remains valid as long as the supply
voltage is applied to the circuit, hence the name static
...
When registers are used in computational structures
that are constantly clocked such as pipelined datapath, the requirement that the memory
should hold state for extended periods of time can be significantly relaxed
...
The principle is exactly identical to the one used in dynamic logic — c arge
h
stored on a capacitor can be used to represent a logic signal
...
No capacitor is ideal, unfortunately,
and some charge leakage is always present
...
If one wants to preserve
signal integrity, a periodic refresh of its value is necessary
...
Reading the value of the stored signal from a capacitor without disrupting the charge
requires the availability of a device with a high input impedance
...
5
...
24
...
During this period, the slave
stage is in a hold mode, with node 2 in a high-impedance (floating) state
...
Node 2 now stores the
inverted version of node 1
...

NMOS-only pass transistors, resulting in a even-simpler 6 transistor implementation
...

CLK

D

T1

CLK

1

I1

T2

C1
CLK

CLK

2

I3

Q

C2
Figure 7
...


The set-up time of this circuit is simply the delay of the transmission gateand corre,
sponds to the time it takes node 1 to sample theD input
...
The propagation delay (tc-q) is equal to two inverter delays plus the delay of
the transmission gate T2
...
fm Page 290 Tuesday, April 18, 2000 8:52 PM

290

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

One important consideration for such a dynamic register is that the storage nodes
(i
...
, the state) has to be refreshed at periodic intervals to prevent a loss due to charge leakage, due to diode leakage as well as sub-thresholdcurrents
...

Clock overlap is an important concern for this register
...
25
...
This is known as a race condition
...
The same is true for the 1-1 overlap region, where an
input-output path exists through the PMOS of T1 and the NMOS of T2
...
That is, the data must be stable during
the high-high overlap period
...
Generally
the
built in single inverter delay should be sufficient and the overlap period constraint is given
as:
t overlap 0 – 0 < t T1 + t I1 + t T2

(7
...
5)

(0,0) overlap
CLK
(1,1) overlap
CLK
Figure 7
...


7
...
2

C2MOS Dynamic Register: A Clock Skew Insensitive Approach

The C2MOS Register
Figure 7
...
This circuit is called the CMOS (Clocked
CMOS) register [Suzuki73]
...

1
...
The master
stage is in the evaluation mode
...
Both transistors M7 and M8 are off, decoupling the output
from the input
...


chapter7
...
5

Dynamic Latches and Registers

291

VDD

VDD

M2

M6

M4

CLK

CLK

M8

X

D
M3

CLK

M1

Master Stage

C L1

CLK

M7

Q
CL2

M5

Slave Stage

2

Figure 7
...


2
...
The value stored on CL1
(
propagates to the output node through the slave stage which acts as an inverter
...
However, there is an
important difference:
A C2MOS register with CLK-CLK clocking is insensitive to overlap, as long as the rise and
fall times of the clock edges are sufficiently small
...
25)
...
27a in which both PMOS devices are on during this period
...
This is not desirable since data should not change on the negative edge for a
positive edge-triggered register
...
However,this data cannot
propagate to the output since the NMOS deviceM7 is turned off
...

Therefore, any new data sampled on the falling clock edge is not seen the slave output
at
Q, since the slave state is off till the next rising edge of the clock
...

The (1-1) overlap case (Figure 7
...
T question is again if new data sampled durhe
ing the overlap period (right after clock goes high) propagate to the Q output
...
fm Page 292 Tuesday, April 18, 2000 8:52 PM

292

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

edge
...
However, as soon as the overlap period is over,
the PMOS M8 is turned on and the 0 propagates to output
...
The
VDD
M2

0

VDD

VDD
M6

M4

D

X

M2

M8

0

1

M3

M1

M5

M6

X

Q D

M1

VDD

Q
1

M7

M5

(b) (1-1) overlap
(a) (0-0) overlap
Figure 7
...
No feasible signal path can exist between
In and D, as illustrated by the arrows
...

2
In summary, it can be stated that the CMOS latch is insensitive to clock overlaps
because those overlaps activate either the pull-up or the pull-down networks of the latches,
but never both of them simultaneously
...
This creates a path between input and output that can destroy the state
of the circuit
...
This criterion is not too stringent and is easily met in practical
,
designs
...
28, which plots the
2
simulated transient response of a C MOS D FF for clock slopes of respectively 0
...
For slow clocks, the potential for arace condition exists
...
0
2
...
0
CLK(3)

1
...
1)

X(0
...
1)

1
...
5

Figure 7
...
1 nsec and 3 nsec
clock rise (fall) times assumingIn = 1
...
0
-0
...
fm Page 293 Tuesday, April 18, 2000 8:52 PM

Section 7
...
It is also possibleto design sequential circuits that sample the input on both edges
...
Figure 7
...
It consists of two parallel master-slave based edge-triggered registers, whose outputs are
multiplexed using the tri-state drivers
...
Node Y is held stable, since devices M9 and M10 are turned
off
...
During the low phase, the bottom master latch M1,
(
M4, M9, M10) is turned on, sampling the inverted D input on node Y
...
On the rising edge, the bottom
slave latch conducts, and drives the inverted version of Y on node Q
...
Note that the slave latches operate in a complementary fashion — this is,
only one of them is turned on during each phase of the clock
...
29 C2MOS based dual-edge triggered register
...
6 Dual-edge Registers
Determine how the adoption of dual-edge registers influences the power-dissipation in the
clock-distribution network
...
fm Page 294 Tuesday, April 18, 2000 8:52 PM

294

DESIGNING SEQUENTIAL LOGIC CIRCUITS

7
...
3

Chapter 7

True Single-Phase Clocked Register (TSPCR)

In the two-phase clocking schemes described above, care must be taken in routing the two
2
clock signals to ensure that overlap is minimized
...
TheTrue
Single-Phase Clocked Register (TSPCR) proposed by Yuan and Svensson uses a single
clock (without an inverse clock) [Yuan89]
...
30
...
30

CLK

CLK
Out

Negative Latch
True Single Phase Latches
...
On the other hand, when
CLK = 0, both inverters are disabled, and the latch is in hold-mode
...
As a result of the dual-stage approach, no
signal can ever propagate from the input of the latch to the output in this
mode
...
The clock load is similar to
a conventional transmission gate register or C2MOS register
...
The disadvantage is the slight increase in the number of transistors — 1 transistors are required
...
This reduces the delay overhead associated withthe latches
...
31a outlines the basic approach for embedding logic, while Figure 7
...
While theset-up time of this latch has increased over the one
shown in Figure 7
...
This approach of embedding logic into latches has b
een
used extensively in the design of the EV4 DEC Alpha microprocessor [Dobberpuhl92]
and many other high performance processors
...
4 Impact of embedding logic into latches on performance
Consider embedding an AND gate into the TSPC latch as shown in Figure 7
...
In a 0
...
A conventional
approach, composed of an AND gate followed by a positive latch has an effectiveset-up time
of 600 psec (we treat the AND plus latch as a black box that perform the AND+latching
s

chapter7
...
5

Dynamic Latches and Registers

VDD

295

VDD

VDD

In1

VDD
In2

Q

PUN
Q
In

CLK

CLK

CLK

CLK

In1
PDN

In2

(b) AND latch
(a) Including logic into the latch
Figure 7
...


functions)
...


The TSPC latch circuits can be further reduced in complexity as illustrated in Figure
7
...
Besides the reduced number
of transistors, these circuits have the advantage that the clock load is reduced by half
...
For
instance, the voltage at node A (for Vin = 0 V) for the positive latch maximally equalsVDD
– V Tn, which results in a reduced drive for the output NMOS transistor and a loss in performance
...
This also limits the amount ofVDD scaling possible on the latch
...
32 Simplified TSPC latch (also called split-output)
...
33 shows the design of a specialized single-phase edge-triggered register
...
The second
(dynamic) inverter is in the precharge mode, with M6 charging up node Y to VDD
...
Therefore, during the low phase of
the clock, the input to the final(static) inverter is holding its previous value and the output
Q is stable
...
If X is
high on the rising edge, nodeY discharges
...
On the positive phase of the
clock, note that node X transitions to a low if theD input transitions to a high level
...
This represents the hold time of the register (note that the hold time
less than 1 inverter delay since it takes 1 delay for the input to affect node The propaX)
...
fm Page 296 Tuesday, April 18, 2000 8:52 PM

296

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

gation delay of the register is essentially three inverters since the value on nodeX must
propagate to the output Q
...

VDD
M3
D

VDD
CLK

M6

M9

Y

CLK

X

M2
M1

VDD

CLK

Q
CLK

M5

Q

M4

M8
M7
Figure 7
...


WARNING: Similar to the C2MOS latch, the TSPC latch malfunctions when theslope of
the clock is not sufficiently steep
...
The clock slopes should therefore be carefully controlled
...

Example 7
...
With
improper sizing, glitches may occur at the output due to a race condition when the clock transitions from low to high
...
While CLK is
low, Y is pre-charged high turning on M7
...
Once Y is
sufficiently low, the trend onQ is reversed and the node is pulled high anew through M9
...
Figure 7
...
34 for different sizes
of devices in the final two stages
...
0

Qmodified Qoriginal

Volts

2
...
5µm

2µm

Modified
Width

CLK

M7, M8

1µm

1µm

1
...
0
0
...
2

0
...
6
time (nsec)

0
...
0

Figure 7
...
33)
...
It
also reduces the contamination delay of the register
...
fm Page 297 Tuesday, April 18, 2000 8:52 PM

Section 7
...
This is accomplished by reducing he strength of the M7-M8
t
pulldown path, and by speeding up the M4 -M5 pulldown path
...
6 Pulse Registers
Until now, we have used the master-slave configuration to create an edge-triggered register
...
The
idea is to construct a short pulse around the rising or falling) edge of the clock
...
g
...
35a), sampling
the input only in a short window
...
e, the transparent period) of the latch very short
...

Figure 7
...
When CLK = 0, node X is charged up to VDD (MN
is off since CLKG is low)
...
This in turn acti,
vates MN, pulling X and eventually CLKG low (Figure 7
...
The length of the pulse is
controlled by the delay of the AND gate and the two inverters
...
If every register on the chip
uses the same clock generation mechanism,this sampling delay does not matter
...
This must be taken into account when performing timing verification and clock skew analysis (which is the topic of a later Chapter)
...
35 Glitch latch - timing generation and register
...
fm Page 298 Tuesday, April 18, 2000 8:52 PM

298

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

If set-up time and hold time are measured in reference to the rising edge of the glitch
clock, the set-up time is essentially zero, the hold time is equal to the length of the pulse (if
the contamination delay is zero for the gates), and the propagation delay (tc-q) equals two
gate delays
...
The glitch-generation circuitry can be amortized over multiple register bits
...
This has
prevented a wide-spread use
...
g
...

Another version of the pulsed registeris shown in Figure 7
...
When the clock is low, M3 and M6 are off and device P1 is
turned on
...
CLKD is a delay-inverted version of CLK
...
During this interval, the circuit is
transparent
and the input data D is sampled by the latch
...
On
the falling edge of the clock, node X is held at VDD and the output is held stable by the
cross-coupled inverters
...
36 Flow-through positive
edge-triggered register
...
The transparency period also determines the hold time of the register
...
In this particular circuit, the set-up time can be negative
...
This is atractive, as data can arrive at the register even
t
after the clock goes high, which means that time is borrowed from the previous cycle
...
6 Set-up time of glitch register
The glitch register of Figure 7
...

As a result, the input data can actually change after the rising edge of the clock, resulting in a
negative set-up time (Figure 7
...
The D-input transitions to low after the rising edge of the
clock, and transitions high before the falling edge ofCLKD (this is, during the transparency
period)
...
The output Q does go to the correct value
of VDD as long as the input D is set up correctly some time before the falling edge ofCLKD
...
That is, the output can have multiple transitions around the rising
edge, and therefore, the output of the register should not be used as a clock to other registers
...
fm Page 299 Tuesday, April 18, 2000 8:52 PM

Section 7
...
0
2
...
0
1
...
0
0
...
37 Simulation showing a
negative set-up time for the glitch
register
...
0
-0
...
0

0
...
4
0
...
8

1
...
7 Converting a glitch register to a conditional glitch register
Modify the circuit in Figure 7
...
The goal is to
convert the register to a conditional register which latches only when the enable signal is
asserted
...
7 Sense-Amplifier Based Registers
So far, we have presented two fundamental approaches towards building edge-triggered
registers: the master-slave concept and the glitch technique
...
38 introduces
another technique that uses a sense amplifier structure to implement an edge-triggered
register [Montanaro96]
...
As we will see, sense amplifier circuits are used
extensively in memory cores and in low swing bus drivers to amplify small voltage swings
OUT

OUT
VDD

VDD
M9

M7

M8

M10

L2

L1
M5

M6
M4

IN

CLK

L3
M2

VDD
M1

L4
M3

IN

Figure 7
...


chapter7
...
There are many techniques to construct these amplifiers,
with the use of feedback (e
...
, cross-coupled inverters) being one common approach
...
38 uses a precharged front-end amplifier that samples the differential input signal on the rising edge of the clock signal
...
The differential inputs in this implementation
don’ have to have rail-to-rail swing and hence this register can be used as a receiver for a
t
reduced swing differential bus
...
As a result, PMOS transistors M7 and M8 to be turned off and the NAND FF is
holding its previous state
...
On the rising edge of the clock, the evaluate transistor turns
on
and the differential input pair ( 2 and M3) is enabled, and the difference between the input
M
signals is amplified on the output nodes onL1 and L2
...
For example, if is 1, L1 is
IN
pulled to 0, and L2 remains at VDD
...

Initially
0

L1

L2

high
impedance
L3
w/o shorting
device

0

IN
1 0

1

1 0 L4

w/shorting
device

IN
0 1
Inputs Change (CLK still high)

0 1

L1

0 0
L2

leakage path
L3
IN
1 0

0 1

1 0

0 L4
IN
0 1

L1

leakage path
L3
IN
1 0

L2

0 0

1 1

1 0 L4
IN
0 1

The leakage current attempts to
L3 is isolated so charge accumucharge L1/L3 but the DC path
lates until L1/L3 change state,
through the shorting transistor
causing L2 to change state as well
...

As a result the flip-flop outputs change
...
39 The need for the shorting transistorM4
...
This is necessary to accommodate the case where the inputs change
their value after the positive edge ofCLK has occurred, resulting in either L3 or L4 being
left in a high-impedance state with a logical low voltage level stored on the node
...


chapter7
...
8

Pipelining: An approach to optimize sequential circuits

301

could then actually change state prior to the next rising edge ofCLK! This is best illustrated graphically, as shown in Figure 7
...


7
...
The idea is easily explained with the example of Figure 7
...

The goal of the presented circuit is to computelog(|a − b|), where both a and b represent
streams of numbers, that is, the computation must be performed on a large set of input values
...
6)


...
We assume that the registers are edge-triggered D registers
...
In conventional systems (that don’ push
t
the edge of technology), the latter delay is generally much larger than the delays associated with the registers and dominates the circuit performance
...
We note that each logic module is then active for
only 1/3 of the clock period (if the delay of the register is ignored)
...
Pipelining is a technique to improve the
resource utilization, and increase the functional throughput
...
40b
...
1
...


REG

CLK
REG

a

REG

CLK

Out

CLK
(b) Pipelined version

CLK
Figure 7
...

a

chapter7
...
At
that time, the circuit has already performed parts of the computations for the next data
sets, (a2, b2) and (a3,b3)
...

Table 7
...

Clock Period

Adder

Absolute Value

Logarithm

1

a1 + b1

2

a2 + b2

|a1 + b1|

3

a3 + b3

|a2 + b2|

log(|a1 + b1|)

4

a4 + b4

|a3 + b3|

log(|a2 + b2|)

5

a5 + b5

|a4 + b4|

log(|a3 + b3|)

The advantage of pipelined operation becomes apparent when examining the minimum clock period of the modified circuit
...
This effectively reduces the value of the minimum allowable clock period:
T min ,pipe = t c-q + max (t pd ,add,t pd ,abs, t pd ,log)

(7
...
The pipelined network
outperforms the original circuit by a factor of three under these assumptions, or min,pipe=
T
Tmin/3
...
This explains why pipelining is popular in the implementation of very high
-performance datapaths
...
8
...
Register-Based Pipelines

Pipelined circuits can be constructed using level-sensitive latches instead of edge-triggered registers
...
41
...
That is, logic is introduced between the master and slave latches of a
master-slave system
...
Latch-based systems give significantly more flexibility in implementing a pipelined system, and often offers higher performance
...
Input data is sampled on C1 at the negative edge of CLK and the computation of
logic block F starts; the result of the logic block F is stored on C2 on the falling edge of
CLK, and the computation of logic block G starts
...
For the example at hand, pipelining increases the latency from 1 to 3
...


chapter7
...
8

Pipelining: An approach to optimize sequential circuits

CLK

CLK

303

CLK
Out

In

F

G

C1

C2

C3

CLK

CLK

Compute F

compute G

Figure 7
...


ensures correct operation
...
When overlap exists between CLK and CLK, the next input is already
being applied to F, and its effect might propagate to C2 before CLK goes low (assuming
that the contamination delay of F is small)
...
Which value wins depends upon the logic function
F,
the overlap time, and the value of the inputs since thepropagation delay is often a function of the applied inputs
...

7
...
2

NORA-CMOS— A Logic Style for Pipelined Structures

2
The latch-based pipeline circuit can also be implemented using C MOS latches, as shown
in Figure 7
...
The operation is similar to the one discussed above
...


The reasoning for the above argument is similar to the argument made in the construction of a C2MOS register
...
27)
...
43, where F is replaced by a single, static CMOS inverter
...

Based on this concept, a logic circuit style called NORA-CMOS was conceived
[Goncalves83]
...
Each module consists of a block of combinational logic that can be a mixture of
2
static and dynamic logic, followed by a CMOS latch
...
A block

chapter7
...
42
VDD

C2

CLK

C3

2
Pipelined datapath using C MOS latches
...
43 Potential race condition
2
during (0-0) overlap in C MOS-based
design
...
Examples of both classes are shown in Figure 7
...

The operation modes of the modules are summarized in Table 7
...

Table 7
...


CLK block

CLK block

Logic

CLK = 0
CLK = 1

Latch

Logic

Latch

Precharge

Hold

Evaluate

Evaluate

Evaluate

Evaluate

Precharge

Hold

A NORA datapath consists of a chain of alternatingCLK and CLK modules
...
Data is passed in a pipelined fashion from
module to module
...
Dynamic and static logic
can be mixed freely, and bothCLKp and CLKn dynamic blocks can be used in cascaded or
in pipelined form
...

Design Rules
In order to ensure correct operation, two important rules should always be followed:

chapter7
...
8

Pipelining: An approach to optimize sequential circuits

VDD
CLK

VDD

305

VDD

CLK
CLK

In1
In2
In3

Out

PUN
PDN
CLK
CLK

CLK

Combinational logic

Latch

(a) CLK-module
VDD
CLK

VDD

VDD

VDD

In4
CLK
Out

In1
In2
In3

PDN
CLK

CLK

In4

(b) CLK-module
Figure 7
...


• The dynamic-logic rule: Inputs to a dynamic CLKn (CLKp) block are only allowed to
make a single 0 → 1 (1 → 0) transition during the evaluation period (Chapter6)
...


The presence of dynamic logic circuits requires the introduction of some extensions
to the latter rule
...
45a
...
Assume now that a (0-0) overlap occurs
...
45b)
...
This translates into the following
rule: The number of static inversions between the last dynamic block in a logic function
and the C2MOS latch should be even
...


chapter7
...
45

(b) Same circuit during (0-0) clock overlap
...


Revised C2MOS Rule
• The number of static inversions betweenC2 MOS latches should be even (in the absence
of dynamic nodes); if dynamic nodes are present, the number of static inverters between
a latch and a dynamic gate in the logic block should be even
...


Adhering to the above rules is not always trivial and requires a careful analysis of
the logic equations to be implemented
...
Its use should only be considered when maximum
circuit performance is a must
...
9 Non-Bistable Sequential Circuits
In the preceding sections, we have focusedon one single type of s
equential element, this is
the latch (and its sibling the register)
...
The bistable element is not the only
sequential circuit of interest
...
The former act as oscillators and can, for instance, be used for on-chip clock
generation
...
Another
interesting regenerative circuit is the Schmitt trigger
...
This peculiar
feature can come in handy in noisy environments
...
fm Page 307 Tuesday, April 18, 2000 8:52 PM

Section 7
...
9
...
It responds to a slowly changing input waveform with afast transition time at the
output
...
The voltage-transfer characteristic of the device displaysdifferent switching thresholds for positive- and negative-going input signals
...
46, where a typical voltage-transfer characteristic of the Schmitt trigger is shown
(and its schematics symbol)
...
The hysteresis voltage is
defined as the difference between the two
...
46
Non-inverting Schmitt trigger
...
This is illustrated in Figure 7
...
Notice how the
hysteresis suppresses the ringing on the signal
...
For instance, steep signal
slopes are beneficial in reducing power consumptionby suppressing direct-path currents
...

Vin

Vout

VM+

VM–

t0
Figure 7
...


t0 + tp

t

chapter7
...
48 The

...
Increasing the ratio results
in a reduction of the threshold, while decreasing it results in an increase in M
...
This adaptation is achieved with the aid of feedback
...
48 CMOS Schmitt trigger
...
The feedback loop
biases the PMOS transistor M4 in the conductive mode while M3 is off
...
This
M
modifies the effective transistor ratio of the inverter tokM1/(kM2+kM4), which moves the
switching threshold upwards
...
This extra pull-down device speeds up the transition and produces a clean
output signal with steep slopes
...
In this case, the
pull-down network originally consists ofM1 and M3 in parallel, while the pull-up network
is formed by M2
...

Example 7
...
Device M1 and M2 are
s
1µm/0
...
25µm, respectively
...
25 V)
...
49a shows the simulation of theSchmitt trigger assuming that devices M 3 and M4 are 0
...
25µm and 1
...
25µm, respectively
...
The high
-to-low switching point (VM- =
0
...
6 V) is larger
than VDD/2
...
For
example, to modify the low-to-high transition, we need to vary the PMOS device
...
5 µm
...
5µm
...
49b demonstrates how the switching threshold
increases with raising values ofk
...
fm Page 309 Tuesday, April 18, 2000 8:52 PM

Section 7
...
5

2
...
0

2
...
5
Vx (V)

VX (V)

1
...
0

1
...
5

0
...
0
0
...
5

1
...
5

2
...
5

0
...
0

V in(V)

k=4

k=2

0
...
0

1
...
0

2
...
The width is k * 0
...

Figure 7
...


(a) Voltage-transfer characteristics with hysteresis
...
8 An Alternative CMOS Schmitt Trigger
Another CMOS Schmitt trigger is shown in Figure 7
...
Discuss the operation of the gate,
and derive expressions for VM− and VM+
...
50 Alternate CMOS Schmitt trigger
...
9
...
It is called
monostable
because it has only one stable state (the quiescent one)
...
This means that it eventually returns to its original state after a time period determined by the circuit parameters
...
This functionality is required in a wide range of applications
...
Another

chapter7
...
This circuit detects a change in a signal, or group of signals,
such as the address or data bus, and produces a pulse to initialize the subsequent circuitry
...
The concept is illustrated in Figure
7
...
In the quiescent state, both inputs to the XOR are identical, and the output is low
...
After a delay td (of the delay element), this disruption is removed, and the output
goes low again
...
The delay circuit can be realized in many
different ways, such as anRC-network or a chain of basic gates
...
51

7
...
3

Out

td

Transition-triggered one-shot
...
The output oscillates back and forth between two
quasi-stable states with a period determined by the circuit topology and parameters (delay,
power supply, etc
...
This application is discussed in detail in a later chapter (on timing)
...
It consists of an odd
number of inverters connected in a circular chain
...

Example 7
...
52 (all gates
use minimum-size devices)
...
5 nsec,
which corresponds to a gate propagation delay of 50 psec
...
0
v1 v3 v5
2
...
0
1
...
0
0
...
0
-0
...
0

0
...
0

1
...
52 Simulated
waveforms of five-stage ring
oscillator
...


chapter7
...
9

Non-Bistable Sequential Circuits

311

played in the plot)
...


The ring oscillator composed of cascaded inverter produces a waveform with a
s
fixed oscillating frequency determined by the delay of an inverter in the CMOS process
...
An example
of such a circuit is the voltage-controlled oscillator (VCO), whose oscillation frequency is
a function (typically non-linear) of a control voltage
...
53 [Jeong87]
...


VDD
M2

In

Out

M1
Iref
Vcntl

M3

Figure 7
...


In this modified inverter circuit, the maximal discharge current of the inverter is limited by adding an extra series device Note that the low-to-high transition on the inverter

...
The added NMOS
transistor M3, is controlled by an analog control voltageVcntl, which determines the available discharge current
...
The ability to alter thepropagation delay per stage allows us to control the frequency
of the ring structure
...
Under
low operating current levels, the current-starved inverter suffers from slow fall times at its
output
...
This is resolved by feeding its
output into a CMOS inverter or better yet a Schmitt trigger
...

Example 7
...
54 show the simulated delay of the current-starved inverter as a function of the control voltage Vcntl
...
When the control
voltage is smaller than the threshold, the device enters the sub-threshold region
...
When operating in this region, t e delay is very sensitive to variations in
h
the control voltage, and, hence, to noise
...
fm Page 312 Tuesday, April 18, 2000 8:52 PM

312

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

5
...
0
3
...
0

Figure 7
...


1
...
0
0
...
0

1
...
0

2
...
55a
...
Figure 7
...
The simulated waveforms o this two stage VCO are shown in Figure 7
...
The inf
phase and quadrature phase outputs are available simultaneously The differential type

...
However, it consumes more power due to the increased complexity, and the static current
...
0
2
...
0
1
...
0
0
...
0
-0
...
5

1
...
5
time (nsec)
(c) simulated waveforms of 2-stageVCO

3
...
55 Differential delay
element and VCO topology
...
fm Page 313 Tuesday, April 18, 2000 8:52 PM

Section 7
...
10

Perspective: Choosing a Clocking Strategy

313

Perspective: Choosing a Clocking Strategy
A crucial decision that must be made in the earliest phases of a chip design is select the
to
appropriate clocking methodology
...
Choosing the right clocking scheme affects the functionality,
speed and power of a circuit
...
The
most robust and conceptually simple scheme is the two-phase master-slave design
...
More exotic schemes
such as the glitch register are also used in practice
...
An example of such is the n for a negative seteed
up time to cope with clock skew
...
Most automated design methodologies such as standard cell employ a single-phase, edge-triggered approach, based on
static flip-flops
...
The use of latches between logic is
also very common to improve circuit performance
...
11 Summary
This chapter has explored the subject of sequential digital circuits
...
A
third potential operation point turns out to be metastable; that is, any diversion from
this bias point causes the flip-flop to converge to one of the stable states
...
A register (sometime also called aflip-flop) on the
other hand samples the data on the rising or falling edge
...
These
parameters must be carefully optimized since they may account for a significant
portion of the clock period
...
A static register holds state as long as the power
supply is turned on
...
g
...
Dynamic memory is based on temporary
charge store on capacitors
...
However, charge on a dynamic node leaks away
with time, and hence dynamic circuits have a minimum clock frequency
...

w
The most common and widely used approach is the master-slave configuration
which involves cascading a positive latch and negative latch (or vice-versa)
...
fm Page 314 Tuesday, April 18, 2000 8:52 PM

314

DESIGNING SEQUENTIAL LOGIC CIRCUITS

Chapter 7

• Registers can also be constructed using the pulse or glitch concept
...
Generally, the design of such circuits requires careful timing analysis across all process
corners
...

• Choice of clocking style is an important consideration
...
Circuit techniques such as C MOS can be used to eliminate race
conditions in two-phase clocking
...
However, the rise time of clocks must be carefully optimized to eliminate races
...
An example of such an approach, the NORA logic style, is
very effective in pipelined datapaths
...
They are useful as pulse generators
...
The ring oscillator is
the best-known example of a circuit of this class
...
They are mainly used to suppress noise
...
12 To Probe Further
The basic concepts of sequential gates can be found in many logic design textbooks (e
...
,
[Mano82] and [Hill74])
...


References
[Dopperpuhl92] D
...
, “A 200 MHz 64-b Dual Issue CMOS Microprocessor,” IEEE
JSSC, vol
...
11, Nov
...
1555–1567
...
Bieseke et al
...
176-177, Feb
...

[Goncalves83] N
...
De Man, “NORA: a racefree dynamic CMOS technique for
pipelined logic structures,” IEEE JSSC, vol
...
3, June 1983, pp
...

[Haznedar91] H
...

[Hill74] F
...
Peterson, Introduction to Switching Theory and Logical Design, Wiley, 1974
...
Hodges and H
...

[Jeong87] D
...
, “Design of PLL-based clock generation circuits,”IEEE JSSC, vol
...
2, April 1987, pp
...

[Kuzo96] S
...
, “A 100MHz 0
...
140-141, February 1996
...
Mano, Computer System Architecture, Prentice Hall, 1982
...
Montanaro et al
...
5-W CMOS RISC Microprocessor,” IEEE
JSSC, pp
...


chapter7
...
13

Exercises and Design Problems

315

[Mutoh95] S
...
, “1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS,” IEEE JSSC , pp
...

[Partovi96] H
...
138-139, February 1996
...
H
...
15, January 1938, pp
...

[Shoji88] M
...

[Suzuki73] Y
...
Odagawa, and T
...
SC-8, December 1973, pp
...

[Veendrick92] H
...

[Yuan89] J
...
, “High-Speed CMOS Circuit Technique,” IEEE JSSC, vol
...
1, February 1989, pp
...


7
...
reset, Tom’ latch
s
- static in one phase, look in Hamid’ paper,
s
1
...
2] The indicated waveforms are applied to the JK master-slave flip-flop of Figure
7
...
For this problem, assume that gate delays are short compared to the input signal time scale
...
Sketch the waveforms that appear at the QM and QS outputs of the master and slave
latches
...

b
...


Figure 7
...


[M&D, SPICE, 6
...
14 using static CMOS minimum-size
devices
...

a
...
Find
these gate delays using SPICE under the appropriate loading conditions
...

b
...
Find the
new set-up and hold times and thepropagation delay s using SPICE
...
fm Page 42 Tuesday, April 16, 2002 9:12 AM

CHAPTER

10

TIMING ISSUES
IN DIGITAL CIRCUITS
Impact of clock skew and jitter on performance and functionality
n
Alternative timing methodologies
n
Synchronization issues in digital IC and board design
n
Clock generation

10
...
5 Synchronizers and Arbiters*

10
...
6 Clock Synthesis and Synchronization Using a
Phase-Locked Loop

10
...
3
...
3
...
3
...
7 Future Directions
10
...
9 Summary
10
...
3
...
4 Self-Timed Circuit Design*

42

chapter10_141
...
1

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

Introduction
All sequential circuits have one property in common—a well-defined ordering of the
switching events must be imposed if the circuit is to operate correctly
...
The synchronous system approach, in which all memory elements in the system are
simultaneously updated using a globally distributed periodic synchronization signal (that
is, a global clock signal), represents an effective and popular way to enforce this ordering
...

This Chapter starts with an overview of the different timing methodologies
...
We analyze the
impact of spatial variations of the clock signal, called clock skew, and temporal variations
of the clock signal, called clock jitter, and introduce techniques to cope with it
...

At the other end of the design spectrum is an approach called asynchronous design,
which avoids the problem of clock uncertainty all-together by eliminating the need for
globally-distributed clocks
...
The important issue of synchronization, which is required when interfacing different clock domains
or when sampling an asynchronous signal, also deserves some in-depth treatment
...


10
...
Signals that transition only at predetermined periods
in time can be classified as synchronous, mesochronous, or plesiochronous with respect to
a system clock
...

10
...
1

Synchronous Interconnect

A synchronous signal is one that has the exact same frequency, and a known fixed phase
offset with respect to the local clock
...
In
digital logic design, synchronous systems are the most straightforward type of interconnect, where the flow of data in a circuit proceeds in lockstep with the system clock as
shown below
...
After a suitable setting period, the output Cout becomes valid and can be sampled by

chapter10_141
...
2

Classification of Digital Systems

44

CLK
In
R1
Cin

Figure 10
...


Combinational
Logic

R2
Cout

Out

R2 which synchronizes the output with the clock
...
The length of the “uncertainty period,” or the period where data is not valid, places an upper bound on how fast a
synchronous interconnect system can be clocked
...
2
...
For example, if data is
being passed between two different clock domains, then the data signal transmitted from
the first module can have an unknown phase relationship to the clock of the receiving
module
...
A (mesochronous) synchronizer can
be used to synchronize the data signal with the receiving clock as shown below
...

Block A

R1

D1

Interconnect
Delay

D3
D2

R2

Block B
D4
ClkB

ClkA
PD/
Control

Figure 10
...


In Figure 10
...
However, D1 and D2
are mesochronous with ClkB because of the unknown phase difference between ClkA and
ClkB and the unknown interconnect delay in the path between Block A and Block B
...
In this example, the variable delay element is adjusted by measuring the phase difference between the
received signal and the local clock
...

10
...
3

Plesiochronous Interconnect

A plesiochronous signal is one that has nominally the same, but slightly different frequency as the local clock (“plesio” from Greek is near)
...
fm Page 45 Tuesday, April 16, 2002 9:12 AM

45

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

drifts in time
...
Since the transmitted signal can
arrive at the receiving module at a different rate than the local clock, one needs to utilize a
buffering scheme to ensure all data is received
...
A possible
framework for plesiochronous interconnect is shown in Figure 10
...

Clock C2

Timing
Recovery

Clock C 1

C3
Originating
Module

Receiving
Module

FIFO

Figure 10
...


In this digital communications framework, the originating module issues data at
some unknown rate characterized by C1, which is plesiochronous with respect to C2
...
As a result, C3 will be synchronous with the data at the input of
the FIFO and will be mesochronous with C1
...
However, by making the FIFO large enough, and periodically resetting the system whenever an overflow condition occurs, robust communication can be
achieved
...
2
...
As a result, it is not straightforward to map these arbitrary transitions into a synchronized data stream
...
In such an approach, communication between modules is controlled through a handshaking protocol to perform the
proper ordering of commands
...
4 Asynchronous design methodology for simple pipeline interconnect
...
fm Page 46 Tuesday, April 16, 2002 9:12 AM

Section 10
...
The handshaking signals then initiate a data transfer to
the next block, which latches in the new data and begins a new computation by asserting
the initialization signal I
...
There is no need to manage clock skew, and the design methodology leads to a very modular approach where interaction between blocks simply occur
through a handshaking procedure
...


10
...
3
...
The
generation and distribution of a clock has a significant impact on performance and power
dissipation
...
In the ideal world, assuming the
clock paths from a central distribution point to each register are perfectly balanced, the
phase of the clock (i
...
, the position of the clock edge relative to a reference) at various
points in the system is going to be exactly equal
...
This results in performance degradation and/or circuit
malfunction
...
5 shows the basic structure of a synchronous pipelined datapath
...
5 Pipelined Datapath Circuit and timing parameters
...
The following timing parameters characterize the timing of the
sequential circuit
...

• The set-up (tsu) and hold time (thold) for the registers
...


chapter10_141
...

Under ideal conditions (tclk1 = tclk2), the worst case propagation delays determine the
minimum clock period required for this sequential circuit
...
This constraint is given by (as
derived in Chapter 7):
T > t c – q + t log ic + t su

(10
...
2)

The above analysis is simplistic since the clock is never ideal
...

Clock Skew
The spatial variation in arrival time of a clock transition on an integrated circuit is commonly referred to as clock skew
...
Consider the transfer of data between registers R1 and R2 in Figure
10
...
The clock skew can be positive or negative depending upon the routing direction and
position of the clock source
...
6
...

TCLK + δ

δ

4

2

1

CLK2

TCLK

3

CLK1

δ + th
Figure 10
...
In
this sample timing diagram, δ > 0
...
That is, if in one cycle CLK2 lagged CLK1 by
δ, then on the next cycle it will lag it by the same amount
...


chapter10_141
...
3

Synchronous Design — An In-depth Perspective

48

TCLK + δ
TCLK

1
4

2

CLK2

3

CLK1

δ

Figure 10
...
The rising edge of CLK2 arrives
earlier than the edge of CLK1
...
First consider the impact of clock skew on performance
...
6, a new
input In sampled by R1 at edge
will propagate through the combinational logic and be
sampled by R2 on edge
...
The output of the combinational logic
must be valid one set-up time before the rising edge of CLK2 (point )
...
3)

The above equation suggests that clock skew actually has the potential to improve
the performance of the circuit
...

As above, assume that input In is sampled on the rising edge of CLK1 at edge into
R1
...
However, if the minimum delay of the combinational logic block is small, the inputs to R2 may change before the clock edge , resulting
in incorrect evaluation
...
The constraint can be formally stated as

1

4

2

2

δ + t hol d < t ( c – q, cd ) + t ( log ic, cd )
or
δ < t ( c – q, cd ) + t ( log ic, cd ) – t hold

(10
...
7 shows the timing diagram for the case when δ < 0
...
On the rising edge of CLK1, a
new input is sampled by R1
...
As
can be seen from Figure 10
...
(10
...
However, a negative skew implies that the system never fails,
since edge
happens before edge ! This can also be seen from Eq
...
4), which is
always satisfied since δ < 0
...
fm Page 49 Tuesday, April 16, 2002 9:12 AM

49

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

Example scenarios for positive and negative clock skew are shown in Figure 10
...

In

R1
Combinational
Logic

Q

D

R2

tCLK1

CLK

Combinational
Logic

Q

D

R3
D

tCLK2

Q
tCLK3

delay

delay
(a) Positive skew

In

R1
Combinational
Logic

Q

D

R2
D

Combinational
Logic

Q

tCLK1

R3
D

Q
tCLK3

tCLK2

delay

CLK

delay
(b) Negative skew

Figure 10
...


• δ > 0—This corresponds to a clock routed in the same direction as the flow of the data
through the pipeline (Figure 10
...
In this case, the skew has to be strictly controlled
and satisfy Eq
...
4)
...
Reducing the clock frequency of an edge-triggered circuit
does not help get around skew problems! On the other hand, positive skew increases the
throughput of the circuit as expressed by Eq
...
3), because the clock period can be
shortened by δ
...
(10
...

• δ < 0—When the clock is routed in the opposite direction of the data (Figure 10
...
4) is unconditionally met
...
The skew reduces the time available for actual computation so that the clock period has to be increased by |δ|
...

Unfortunately, since a general logic circuit can have data flowing in both directions (for
example, circuits with feedback), this solution to eliminate races will not always work (Figure
10
...
The skew can assume both positive and negative values depending on the direction of

In

REG

Logic

Logic

CLK
Positive skew

CLK
Clock distribution
Figure 10
...


REG

CLK

REG

REG

Negative skew

CLK

Logic

Out

chapter10_141
...
3

Synchronous Design — An In-depth Perspective

50

the data transfer
...
In general, routing the clock so that only negative skew occurs is not feasible
...


Example 10
...
10
...
The maximum
and minimum delays of the gates is made, as they are assumed to be identical
...
On the
other hand, computation of the worst case propagation delay is not as simple as it appears
...

However, when analyzing the data dependencies, it becomes obvious that path
is never
exercised
...
If A = 1, the critical path goes through OR1 and OR2
...
For the case when A= 0 and B =1, the critical path is through I1,OR1, AND 3 and OR2
...
Therefore, the propagation delay is 4 tgate
...


1

1

OR1

OR2

I1
AND1

path
AND2

1

C

2

B

1

path

A

AND 3

D
Figure 10
...


WARNING: The computation of the worst-case propagation delay for combinational
logic, due to the existence of false paths, cannot be obtained by simply adding the propagation delay of individual logic gates
...


Clock Jitter
Clock jitter refers to the temporal variation of the clock period at a given point — that is,
the clock period can reduce or expand on a cycle-by-cycle basis
...
Jitter can be measured and cited in one of many ways
...
fm Page 51 Tuesday, April 16, 2002 9:12 AM

51

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

of a single clock period and for a given spatial location i is given as Tjitter,i(n) = Ti, n+1 - Ti,n
- TCLK, where Ti,n is the clock period for period n, Ti, n+1 is clock period for period n+1, and
TCLK is the nominal clock period
...
Figure 10
...
Ideally the clock period starts at
edge and ends at edge and with a nominal clock period of TCLK
...

As a result, the total time available to complete the operation is reduced by 2 tjiiter in the
worst case and is given by

5

2

4

3

TCLK – 2t j itt er ≥ t c – q + t log i c + t su or T ≥ t c – q + t log ic + t su + 2t j itt er

(10
...
Care must be taken to reduce jitter in the clock network to maximize performance
...
11 Circuit for studying the impact of jitter on performance
...
Consider the sequential circuit show in Figure 10
...

Assume that nominally ideal clocks are distributed to both registers (the clock
period is identical every cycle and the skew is 0)
...
Assume that CLK1 has a jitter of tjitter1 and
CLK2 has a jitter of tjitter2
...
The worst
case happen when the leading edge of the current clock period on CLK1 happens late
(edge ) and the leading edge of the next cycle of CLK2 happens early (edge )
...
6)

chapter10_141
...
3

Synchronous Design — An In-depth Perspective

52

R1

In

R2

Combinational
Logic

Q

D

D

tCLK1

tCLK2

5 2

TCLK

3 1

tjitter2

CLK2

11

9 7

δ

8

TCLK + δ

tjitter1

CLK1

Q

6 4

12

Figure 10
...
In this example, a positive skew (δ) is assumed
...
To formulate
the minimum delay constraint, consider the case when the leading edge of the CLK1 cycle
arrives early (edge ) and the leading edge the current cycle of CLK2 arrives late (edge
)
...
This results in

6

1

1

6

δ + t hold + t ji tter1 + t ji tte r2 < t ( c – q, c d) + t ( log i c, cd )
or
δ < t ( c – q, cd ) + t ( log ic, cd ) – t hold – t j itt er1 – t jit ter2

(10
...

Now consider the case when the skew is negative (δ <0) as shown in Figure 10
...

For the timing shown, |δ| > tjitter2
...
That is, negative
skew reduces performance
...
13 Consider a negative clock skew (δ) and the skew is assumed to be larger than the jitter
...
fm Page 53 Tuesday, April 16, 2002 9:12 AM

53

TIMING ISSUES IN DIGITAL CIRCUITS

10
...
2

Chapter 10

Sources of Skew and Jitter

A perfect clock is defined as perfectly periodic signal that is simultaneous triggered at various memory elements on the chip
...
To illustrate the sources of skew and jitter, consider the
simplistic view of clock generation and distribution as shown in Figure 10
...
Typically, a
high frequency clock is either provided from off chip or generated on-chip
...
s registers
...
The clock paths include wiring and
the associated distributed buffers required to drive interconnects and loads
...
e
...


Temperature

Capacitive Load

7

Interconnect

6

5

2
4

Clock Generation

3

Devices

Power Supply

Coupling to Adjacent Lines

1

Figure 10
...


The are many reasons why the two parallel paths don’t result in exactly the same
delay
...
First, errors can
be divided into systematic or random
...
g
...
In principle, such errors can be modeled and corrected at design time given sufficiently good models and simulators
...
Random errors
are due to manufacturing variations (e
...
, dopant fluctuations that result in threshold variations) that are difficult to model and eliminate
...
In practice, there is a continuum between changes that are slower
than the time constant of interest, and those that are faster
...
A clock network tuned by a one-time
calibration or trimming would be vulnerable to time-varying mismatch due to varying
thermal gradients
...
For example, the clock net is usually
by far the largest single net on the chip, and simultaneous transitions on the clock drivers
induces noise on the power supply
...
fm Page 54 Tuesday, April 16, 2002 9:12 AM

Section 10
...
Of course, this power supply glitch may still cause static mismatch if it is not the same throughout the chip
...
14, are described in detail
...
A typical on-chip clock generator, as
described at the end of this chapter, takes a low-frequency reference clock signal, and produces a high-frequency global reference for the processor
...
This is an analog circuit, sensitive to intrinsic
device noise and power supply variations
...
This is particularly a problem in
modern fabrication processes that combine a lightly-doped epitaxial layer and a heavilydoped substrate (to combat latch-up)
...
These noise source cause temporal variations of the clock signal that
propagate unfiltered through the clock drivers to the flip-flops, and result in cycle-to-cycle
clock-period variations
...

Manufacturing Device Variations (2)
Distributed buffers are integral components of the clock distribution networks, as they are
required to drive both the register loads as well as the global and local interconnects
...
Unfortunately, as a result of process variations, devices parameters in the
buffers vary along different paths, resulting in static skew
...
The doping variations can affect the
depth of junction and dopant profiles and cause variations in electrical parameters such as
device threshold and parasitic capacitances
...
Keeping the orientation the same across the chip for
the clock drivers is critical
...
Spatial variation usually consists of waferlevel (or within-wafer) variation and die-level (or within-die) variation
...
The random variations
however, ultimately limits the matching and skew that can be achieved
...
fm Page 55 Tuesday, April 16, 2002 9:12 AM

55

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

Interconnect Variations (3)
Vertical and lateral dimension variations cause the interconnect capacitance and resistance
to vary across a chip
...

One important source of interconnect variation is the Inter-level Dielectric (ILD) thickness
variations
...
The oxide layer is deposited over a layer
of patterned metal features, generally resulting in some remaining step height or surface
topography
...
15a)
...
This is
primarily caused due to variations in polish rate that is a function of the circuit layout density and pattern effects
...
15b shows this effect where the polish rate is higher for
the lower spatial density region, resulting in smaller dielectric thickness and higher capacitance
...
Significant advances have been made to develop
analytical models for estimating the ILD thickness variations based on spatial density
...
g
...
Figure 10
...
The graphs show that there
is clear correlation between the density and the thickness of the dielectric
...

Other interconnect variations include deviation in the width of the wires and line
spacing
...
At the lower levels of
metallization, lithographic effects are important while at higher levels etch effects are
important that depend on width and layout
...
15 Inter-level Dielectric (ILD) thickness variation due to density (coutersy of
Duane Boning)
...
fm Page 56 Tuesday, April 16, 2002 9:12 AM

Section 10
...
A detailed review of device and
interconnect variations is presented in [Boning00]
...
16 Pattern density and ILD thickness variation for a high performance microprocessor
...
The two major sources of environmental variations are temperature and
power supply
...
This has particularly become an issue with clock gating where
some parts of the chip maybe idle while other parts of the chip might be fully active
...
Since the device parameters (such as threshold,
mobility, etc
...
More importantly, this component is time-varying since the temperature changes as the logic activity of the circuit
varies
...
An
interesting question is does temperature variation contribute to skew or to jitter? Clearly
the variation in temperature is time varying but the changes are relatively slow (typical
time constants for temperature on the order of milliseconds)
...
Fortunately, using feedback, it is possible to calibrate the temperature and compensate
...
The delay through buffers is a very strong function of power supply as it
directly affects the drive of the transistors
...
Therefore, the buffer delay along one path is
very different than the buffer delay along another path
...
Static power supply variations may result from fixed currents drawn from various modules, while high-frequency
variations result from instantaneous IR drops along the power grid due to fluctuations in
switching activity
...
This has particularly become a concern with clock gating
as the load current can vary dramatically as the logic transitions back and forth between
the idle and active states
...
fm Page 57 Tuesday, April 16, 2002 9:12 AM

57

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

clock signal is modulated on a cycle-by-cycle basis, resulting in jitter
...
Unfortunately, high-frequency
power supply changes are difficult to compensate even with feedback techniques
...

Capacitive Coupling (✖and ✗ )
The variation in capacitive load also contributes to timing uncertainty
...
The clock network includes both the interconnect and the gate capacitance of latches and registers
...
Since the adjacent signal can transition in arbitrary directions and at arbitrary times, the exactly coupling to the clock network
is not fixed from cycle-to-cycle
...
Another major source of clock
uncertainty is variation in the gate capacitance related to the sequential elements
...
In many latches and
registers this translates to the clock load being a function of the current state of the
latch/register (this is, the values stored on the internal nodes of the circuit), as well as the
next state
...

Example 10
...
17, where a minimum-sized local clock buffer drives
a register (actually, four registers are driven, though only one is shown here)
...
The jitter on the clock based on data-dependent capacitance is illustrated
...

CK

T2

CKb

CK

CKb
CLK

2
...
5

CKb
CLK

0
...
5

0

0
...
2
time (ns)

Figure 10
...


0
...
4

chapter10_141
...
3

10
...
3

Synchronous Design — An In-depth Perspective

58

Clock-Distribution Techniques

It is clear from the previous discussion that clock skew and jitter are major issues in digital
circuits, and can fundamentally limit the performance of a digital system
...
Another important consideration in
clock distribution is the power dissipation
...
To reduce power dissipation, clock networks must support clock conditioning — this is, the ability to shut down parts of the
clock network
...

In this section, an overview of basic constructs in high-performance clock distribution techniques is presented along with a case study of clock distribution in the Alpha
microprocessor
...

Fabrics for clocking
Clock networks typically include a network that is used to distribute a global reference to
various parts of the chip, and a final stage that is responsible for local distribution of the
clock while considering the local load variations
...
Therefore
one common approach to distributing a clock is to use balanced paths (or called trees)
...
18, where a 4x4 array is
shown
...
Ideally, if each path is balanced, the clock skew is zero
...
However, in
reality, as discussed in the previous section, process and environmental variations cause
clock skew and jitter to occur
...
18 Example of an H-tree clock-distribution
network for 16 leaf nodes
...
The concept can be generalized to a more generic set-

chapter10_141
...
19 An example RC-matched distribution for an
IBM Microprocessor [Restle98]
...
The more general approach, referred to as routed RC trees, represents a floorplan that
distributes the clock signal so that the interconnections carrying the clock signals to the
functional sub-blocks are of equal length
...
An example of a matched RC is shown in Figure 10
...
The
chip is partitioned into ten balanced load segments (tiles)
...
A lower level RC-matched
tree is used to drive 580 additional drivers inside each tile
...
20 [Bailey00]
...
This approach is fundamentally different
from the balanced RC approach
...
Rather, the absolute delay is minimized assuming that the grid
size is small
...
Unfortunately, the
penalty is the power dissipation since the structure has a lot of unnecessary interconnect
...
Clock
distribution is often only considered in the last phases of the design process, when most of
the chip layout is already frozen
...
20 Grid structures allow a low skew
distribution and physical design flexibility at the
cost of power dissipation [Bailey00]
...
fm Page 60 Tuesday, April 16, 2002 9:12 AM

Section 10
...
With
careful planning, a designer can avoid many of these problems, and clock distribution
becomes a manageable operation
...
These processors have always been at the cutting edge of the
technology, and therefore represent an interesting perspective on the evolution of clock
distribution
...
The first generation Alpha microprocessor (21064 or EV4)
from Digital Equipment Corporation used a single global clock driver [Dobberpuhl92]
...
20, resulting in a total clock load of 3
...
The inputs to the clock drivers are shorted out to smooth out the asymmetry in
the incoming signals
...
The clock driver and the associated pre-drivers account for 40% of the effective
switched capacitance (12
...
The overall
width of the clock driver was on the order of 35cm in a 0
...
A detail clock
skew simulation with process variations indicates that a clock uncertainty of less than
200psec (< 10%) was achieved
...
21 Clock load for the
...
The Alpha 21164 microprocessor (EV5) operates at a
clock frequency of 300 Mhz while using 9
...
5 mm × 18
...
5 µm CMOS technology [Bowhill95]
...
75 nF
...

The incoming clock signal is first routed through a single six-stage buffer placed at
the center of the chip
...
22a)
...
The equivalent transistor width of the final driver
inverter equals 58 cm! To ensure the integrity of the clock grid across the chip, the grid
was extracted from the layout, and the resulting RC-network was simulated
...
22b
...
fm Page 61 Tuesday, April 16, 2002 9:12 AM

61

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

Clock driver

(a) Chip microphotograph, showing positioning of clock drivers
...


Figure 10
...


dent from the plot, the skew is zero at the output of the left and right drivers
...
The critical instruction and
execution units all see the clock within 65 psec
...

The clock skew problems were eliminated by either routing the clock in the opposite direction of the data at a small expense of performance or by ensuring that the data could not
overtake the clock
...
To avoid race-through conditions, a number of design guidelines were followed including:
• Careful sizing of the local clock buffers so that their skew was minimal
...
This gate, which
can be part of the logic function or just a simple inverter, ensures that the signal cannot overtake the clock
...

To improve the inter-layer dielectric uniformity, filler
polygons were inserted between widely spaced lines (Figure
10
...
Though this may increase the capacitance to nearby
signal lines, the improved uniformity results in lower variation
and clock uncertainty
...
This
technique is used in many processors for controlling the clock
skew
...
23 Dummy
fills reduce the ILD
variation and improve
clock skew
...
fm Page 62 Tuesday, April 16, 2002 9:12 AM

Section 10
...
However, making such a circuit work in a reliable way
requires careful planning and intensive analysis
...
A hierarchical clocking scheme is used in the 600Mhz
Alpha 21264 (EV6) processor (in 0
...
24
...
Using a hierarchical clocking approach makes trade-off’s between power and skew
management
...
As seen in previous generation microprocessors, the clock power contributes to a
large fraction of overall power consumption
...
The
drawback of using a hierarchical clock network is that skew reduction becomes more difficult because clocks to various local registers may go through very different paths, which
may contribute to the skew
...


Figure 10
...


The clock hierarchy consists of a global clock grid, called GCLK, that covers the
entire die
...
The onchip generated clock is routed to the center of the die and distributed using tree structures
to 16 distributed clock drivers (Figure 10
...
The global clock distribution network utilizes a windowpane configuration, that achieves low skew by dividing up the clock into 4
regions -- this reduces the distance from the drivers to the loads
...
This also helps the power
supply and thermal problems as the drivers are distributed through the chip
...
The drawback clearly is the increased capacitance
of the Global Clock grid when compared to a tree distribution approach
...
The major clock grid are used to drive different large execution blocks within the chip including 1) Bus interface unit 2) integer issue and execution
units 3) floating point issue and execution units 4) instruction fetch and branch prediction
unit 5) load/store unit and 6) pad ring
...


chapter10_141
...
25 Global clock-distribution network in a window-pane structure
...
The local clocks provide great flexibility in the design of the local logic
blocks, but at the same time, makes it significantly more difficult to manage skew
...

Furthermore, the local clocks are susceptible to coupling from data lines as well because
they are local and not shielded like the global grid-ded clocks
...

Design Techniques—Dealing with Clock Skew and Jitter
To fully exploit the improved performance of logic gates with technology scaling, clock
skew and jitter must be carefully addressed
...
Some guidelines for reducing of clock skew and jitter are
presented below
...
To minimize skew, balance clock paths from a central distribution source to individual clocking elements using H-tree structures or more generally routed tree structures
...

2
...

3
...
The use of gated clocks to
save also results in data dependent clock load and increased jitter
...
g
...

4
...
This elminates races at the cost of perfomance
...
fm Page 64 Tuesday, April 16, 2002 9:12 AM

Section 10
...
Avoid data dependent noise by shielding clock wires from adjacent signal wires
...

6
...
Dummy fills are very
common and reduce skew by increasing uniformity
...

7
...

The use of feedback circuits based on delay locked loops as discussed later in this
chapter can easily compensate for temperature variations
...
Power supply variation is a significant component of jitter as it impacts the cycle to
cycle delay through clock buffers
...
Unfortunately, decoupling
capacitors require a significant amount of area and efficient packaging solutions
must be leveraged to reduce chip area
...
3
...
In an edge-triggered system, the worst
case logic path between two registers determines the minimum clock period for the entire
system
...
The use of a latch based methodology (as illustrated in Figure 10
...
This flexibility, allows an overall performance increase
...

For the latch-based system in Figure 10
...

Assume furthermore that the clock are ideal, and that the two clocks are inverted versions
of each other (for sake of simplicity)
...
On the falling edge of CLK2 (at edge ), the output CLB_A is latched and the
computation of CLK_B is launched
...
This timing appears equivalent to having an edge-triggered system where CLB_A and CLB_B are cascaded and
between two edge-triggered registers (Figure 10
...
In both cases, it appears that the time
available to perform the combination of CLB_A and CLB_B is TCLK
...
fm Page 65 Tuesday, April 16, 2002 9:12 AM

65

TIMING ISSUES IN DIGITAL CIRCUITS

In

L1
D

A

L2

CLB_A
tpd,A

Q

CLK1

B
Q

D

L1

CLB_B
tpd,B

D

CLK2

C

Chapter 10

L2

CLB_C
tpd,C

Q

D

CLK1

Q

CLK2

TCLK

2

launch A

3

compute
CLB_A

1

CLK2

launch B

4

CLK1

launch C
compute
CLB_B

Figure 10
...


However, there is an important performance related difference
...
This approach requires no explicit design changes, as the passing of slack from one block to the next is automatic
...
Stated in another way, if the sequential system works
at a particular clock rate and the total logic delay for a complete cycle is larger than the
clock period, then unused time or slack has been implicitly borrowed from preceding
stages
...
In Figure 10
...
What happens
if the combinational logic block of the previous stage finishes early and has a valid input
data for CLB_A before edge ? Since the a latch is transparent during the entire high
phase of the clock, as soon as the previous stage has finished computing, the input data for
CLB_A is valid
...
e
...
Formally state, slack passing has taken place if TCLK < tpd, A + tpd, B and the logic functions correctly (for simplicity, the delay associated with latches are ignored)
...


2

2

In

L1
D

L1

L2
Q

CLK1

D

Q

CLB_A

CLK2

CLB_B

D

L2
Q

CLK1

D

Q

CLB_C

CLK2

Figure 10
...
26
...
28
...
This implies that the previous block did not use up the
entire high phase of CLK1, which results in slack time (denoted by the shaded area)
...
fm Page 66 Tuesday, April 16, 2002 9:12 AM

Section 10
...
Since L2 is a transparent latch,
c becomes valid on the high phase of CLK2 and CLB_B starts to compute by using the
slack provided by CLB_A
...
As this picture indicates, the total cycle delay, that is
the sum of the delay for CLB_A and CLB_B, is larger than the clock period
...


3

4

L1

In

D

Q

CLB_A
tpd,A

a

b

L2
D Q

CLB_B
tpd,B

c

L1
d

CLK2

CLK1

Q

D

e

CLK1
TCLK

4

2

3

1

CLK1

CLK2
slack passed to next stage
tpd,A

tDQ

tpd,B

tDQ
e valid
d valid

a valid
b valid c valid
Figure 10
...


An important question related to slack passing relates to the maximum possible
slack that can be passed across cycle boundaries
...
28, it is easy that see that
the earliest time that CLB_A can start computing is
...
Therefore, the maximum time that can be borrowed from
the previous stage is 1/2 cycle or TCLK /2
...
This implies that the maximum logic cycle delay is equal to 1
...

However, note that for an n-stage pipeline, the overall logic delay cannot exceed the
time available of n * TCLK
...
3 Slack-passing example

First consider an edge-triggered pipeline of Figure 10
...
Assume that the primary input
In is valid slightly after the rising edge of the clock
...
The latency is two clock cycles (actually, the
output is valid 2
...
Note that for the first pipeline stage, 1/2

chapter10_141
...

This time can be exploited in a latch based system
...
29 Conventional edge-triggered pipeline
...
30 shows a latch based pipeline of the same sequential circuit
...
This is
enabled by slack borrowing between logical partitions
...
30 Latch-based pipeline
...
4

Self-Timed Circuit Design*
10
...
1

Self-Timed Logic - An Asynchronous Technique

The synchronous design approach advocated in the previous sections assumes that all circuit events are orchestrated by a central clock
...

• They insure that the physical timing constraints are met
...
This ensures that only legal logical values are applied in the next round of
computation
...

• Clock events serve as a logical ordering mechanism for the global system events
...
On every

chapter10_141
...
4

Self-Timed Circuit Design*

68

clock transition, a number of operations are initiated that change the state of the
sequential network
...
31
...
The important point to note under
this methodology is that the clock period is chosen to be larger than the worst-case delay
of each pipeline stage, or T > max (tpd1, tpd2, tpd3) + tpd,reg
...
At each clock transition, a new set of inputs is sampled and computation is started anew
...
When to sample a new
input or when an output is available depends upon the logical ordering of the system
events and is clearly orchestrated by the clock in this example
...
31

Logic
Block #1

Q
tpd,reg

R2
D

Q

tpd1

Logic
Block #2
tpd2

R3
D

Q

Logic
Block #3

R4
D

Q

tpd3

Pipelined, synchronous datapath
...
It presents a
structured, deterministic approach to the problem of choreographing the myriad of events
that take place in digital designs
...
The approach is robust and easy to
adhere to, which explains its enormous popularity; however it does have some pitfalls
...
This is not the case in reality, because of effects such as clock skew
and jitter
...
This causes significant noise problems due to package inductance and power supply grid resistance
...
For instance, the throughput rate of the pipelined system of Figure 10
...
On
the average, the delay of each pipeline stage is smaller
...
For example, the propagation delay of a 16-bit adder is highly data dependent
(e
...
, adding two 4-bit numbers requires a much shorter time compared to adding
two 16-bit numbers)
...
Designing a purely asynchronous circuit is a nontrivial and
potentially hazardous task
...
In fact, the logical ordering of the events is dictated by the
structure of the transistor network and the relative delays of the signals
...
fm Page 69 Tuesday, April 16, 2002 9:12 AM

69

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

constraints by manipulating the logic structure and the lengths of the signal paths requires
an extensive use of CAD tools, and is only recommended when strictly necessary
...
Figure 10
...
This approach assumes that each combinational
function has a means of indicating that it has completed a computation for a particular
piece of data
...
The
combinational logic block computes on the input data and in a data-dependent fashion
(taking the physical constraints into account) generates a Done flag once the computation
is finished
...
This signaling ensures the logical ordering of the events and can be
achieved with the aid of an extra Ack(nowledge) and Req(uest) signal
...

Req

Req
HS

Ack

R1

HS

Ack

Start

In

Req

Done

F1
tpF1

HS

Ack

Start

R2

Req

Done

F2
tpF2

Ack

Start

R3

Done

F3

Out

tpF3

Figure 10
...


1
...
If F1 is inactive at
that time, it transfers the data and acknowledges this fact to the input buffer, which
can go ahead and fetch the next word
...
F1 is enabled by raising the Start signal
...

3
...
If this function is free, an Ack(nowledge) is
raised, the output value is transferred, and F1 can go ahead with its next computation
...
The completion signal Done ensures that the physical
timing constraints are met and that the circuit is in steady state before accepting a new
input
...
Both interested parties synchronize with
each other by mutual agreement or, if you want, by shaking hands
...

The choice of protocol is important, since it has a profound effect on the circuit performance and robustness
...
fm Page 70 Tuesday, April 16, 2002 9:12 AM

Section 10
...

• In contrast to the global centralized approach of the synchronous methodology, timing signals are generated locally
...

• Separating the physical and logical ordering mechanisms results in a potential
increase in performance
...
In selftimed systems, a completed data-word does not have to wait for the arrival of the
next clock edge to proceed to the subsequent processing stages
...
For a ripple carry adder, the average length of carry-propagation is O(log
(N))
...

• The automatic shut-down of blocks that are not in use can result in power savings
...
As discussed earlier, this overhead can be
substantial
...

• Self-timed circuits are by nature robust to variations in manufacturing and operating
conditions such as temperature
...
The performance of a self-timed
system is determined by the actual operating conditions
...
32)
...

10
...
2

Completion-Signal Generation

A necessary component of self-timed logic is the circuitry to indicate when a particular
piece of circuitry has completed its operation for the current piece of data
...

Dual-Rail Coding
One common approach to completion signal generation is the use of dual rail coding
...
Consider the redundant data

chapter10_141
...
1
...
For the data to be valid or the computation to be completed, the circuit must be in a
legal 0 (B0 = 0, B1 = 1) or 1 (B0 = 1, B1 = 0) state
...
The (B0 = 1,
B1 = 1) state is illegal and should never occur in an actual circuit
...
1

Redundant signal representation to include transition state
...
33, which is a dynamic version of the DCVSL logic style where the clock is
replaced by the Start signal [Heller84]
...
When the Start signal is low, the circuit is precharged by the PMOS transistors, and the output (B0, B1) goes in the Reset-Transition
state (0, 0)
...
Either
B0 or B1—but never both—goes high, which raises Done and signals the completion of
the computation
...
33 Generation of a
completion signal in DCVSL
...
The completion generation is performed in series with the logic evaluation,
and its delay adds directly to the total delay of the logic block
...

Completion generation thus comes at the expense of both area and speed
...
Redundant signal presentations
other than the one presented in Table 10
...
One essential element is
the presence of a transition state denoting that the circuit is in evaluation mode and the
output data is not valid
...
fm Page 72 Tuesday, April 16, 2002 9:12 AM

Section 10
...
4

72

Self-Timed Adder Circuit

An efficient implementation of a self-timed adder circuit is shown in Figure 10
...
A Manchester-carry scheme is used to boost the circuit performance
...
It is consequentially sufficient to use the differential signaling in
the carry path only (Figure 10
...
The completion signal is efficiently derived by combining the carry signals of the different stages (Figure 10
...
This safely assumes that the sum
generation, which depends upon the arrival of the carry signal, is faster than the completion
generation
...
All other signals such as P(ropagate), G(enerate), K(ill), and S(um) do not require completion generation
and can be implemented in single-ended logic
...
The only difference is that the G(enerate) signal is
replaced by a K(ill)
...

VDD
Start
P0
C0

P1

P2

C0

G0

G1

G2
C2out

C1out

Start

P3
C3

C2

C1

C4

C4

G3
C3out

C4out

VDD
Start

VDD

C4out

C0

P1
C1

C0

K0

C2
K1

C1out

Start

P2
C3
K2
C2out

C4

C4

K3
C3out

C2out

C2out
C1out

C4out

(a) Differential carry generation
Figure 10
...


Replica Delay
While the dual-rail coding above allows tracking of the signal statistics, it comes at the
cost of power dissipation
...
An attempt to reduce the overhead of completion
detection is to use a critical-path replica configured as a delay element, as shown in Figure

chapter10_141
...
35
...
At the same time, the start signal is fed into the replica delay line,
which tracks the critical path of the logic network
...
When the output of the delay line makes a
transition, it indicates that the logic is complete—as the delay line mimics the critical path
...
The advantage of this approach is that the logic can be implemented using a standard non-redundant circuit style such as complementary CMOS
...
Note that this approach generates the completion signal after a time equal to the worst case delay through the network
...
However, it can track the local
effects of process variations and environmental variations (e
...
, temperature or power supply variations)
...

In

LOGIC
NETWORK

Start

DELAY MODULE
(CRITICAL PATH REPLICA)

Out

Done

Figure 10
...


Example 10
...
g
...
Figure 10
...
A current sensor is inserted in
series with the combinational logic, and monitors the current flowing through the logic
...
This signal effectively determines when the logic has completed its cycle
...

If the input data vector does not change from one cycle to the next, no current is
drawn from the supply for static CMOS logic
...
The
outputs of the current sensor and minimum delay element are then combined
...
Ensuring reliability while
keeping the overhead circuitry small is the main challenge
...


chapter10_141
...
4

Start
tdelay

Output

Static CMOS Logic

A
GNDsense

Start

Current Sensor

toverlap
A

B
tMDG
Done

Min Delay Generator

tpd-NOR

Done

B

Output

valid

Figure 10
...
4
...




tMDG > min(tdelay)

toverlap > 0

Self-Timed Signaling

Besides the generation of the completion signals, a self-timed approach also requires a
handshaking protocol to logically order the circuit events avoiding races and hazards
...
37, which shows a sender module transmitting data to a receiver ([Sutherland89])
...
37

Receiver’s action

Two-phase handshaking protocol
...
In some cases the request event is
a rising transition; at other times it is a falling one—the protocol described here does not
distinguish between them
...
If the receiver is busy or its input buffer is full, no Ack event is generated, and
the transmitter is stalled until the receiver becomes available by, for instance, freeing
space in the input buffer
...
The four events, data change, request, data acceptance,

2

1

3

1

chapter10_141
...
38

Muller C-element
...
Successive cycles may take different amounts
of time depending upon the time it takes to produce or consume the data
...
Both phases are terminated by certain events
...
The
sender is free to change the data during its active cycle
...
The receiver can only accept
data during its active cycle
...
37
...
An
essential component of virtually any handshaking module is the Muller C-element
...
38, performs an
AND-operation on events
...
When the inputs differ, the output retains its previous value
...
As long as this does not happen, the output
remains unchanged and no output event is generated
...
Figure 10
...

VDD

VDD
A
A

S

B

R

Q

B
A

F B
B

Figure 10
...


Figure 10
...
Assume that Req, Ack, and Data
Ready are initially 0
...
Req goes

chapter10_141
...
4

Self-Timed Circuit Design*

76

high—this is commonly denoted as Req↑
...
The C-element is blocked, and no new data is sent to
the data bus (Req stays high) as long as the transmitted data is not processed by the
receiver
...
This can be the result
of many different actions, possibly involving other C-elements communicating with
subsequent blocks
...
A Data ready↓ event, which might already have happened
before Ack↑, produces a Req↓, and the cycle is repeated
...
40 A Muller C-element
implements a two-phase handshake
protocol
...


Ack

Handshake logic

Problem 10
...
41 shows a two-phase, self-timed implementation of a FIFO (first-in first-out)
buffer with three registers
...
How can you observe that the FIFO is completely empty (full)? (Hint: Determine the necessary conditions on the Ack and Req signals
...
41 Three-stage
self-timed FIFO, using a twophase signaling protocol
...
There is some
bad news, however
...
Most logic devices in the MOS technology tend to be sensitive to levels or
to transitions in one particular direction
...
Since the transition direction is important, initializing all the
Muller C-elements in the appropriate state is essential
...
fm Page 77 Tuesday, April 16, 2002 9:12 AM

77

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

happen
...

The only alternative is to adopt a different signaling approach, called four-phase signaling, or return-to-zero (RTZ)
...
Once again,
this is illustrated with the example of the sender-receiver
...
42
...
42 Four-phase
handshaking protocol
...
Both the Req and
the Ack are initially in the zero-state, however
...
When ready, the
receiver accepts the data and raises Ack (Ack↑ or )
...
The protocol
proceeds, now by bringing both Req (Req↓ or ) and Ack (Ack↓ or ) back to their initial
state in sequence
...
This protocol is called four-phase because four distinct time-zones can be recognized per cycle: two for the sender; two for the receiver
...
An
implementation of the protocol, based on Muller C-elements, is shown in Figure 10
...
It
is interesting to notice that the four-phase protocol requires two C-elements in series
(since four states must be represented)
...


1

3

5

2

4

1

Data
Sender

Receiver

logic

logic
Data accepted

Data ready
S
C

Req

C
Ack

Handshake logic
Figure 10
...


chapter10_141
...
4

Self-Timed Circuit Design*

Problem 10
...
43
...


The four-phase protocol has the disadvantage of being more complex and slower,
since two events on Req and Ack are needed per transmission
...
The logic in the sender and receiver modules does not have to
deal with transitions, which can go either way, but only has to consider rising (or falling)
transition events or signal levels
...

For this reason, four-phase handshakes are the preferred implementation approach for most
of the current self-timed circuits
...

Example 10
...
Now it is time to bring them all together
...
A view of the self-timed data path, including the
timing control, is offered in Figure 10
...
The logic functions F1 and F2 are implemented
using dual-rail, differential logic
...
44

Done

C

Acko

Self-timed pipelined datapath—complete composition
...
All Start
signals are low so that all logic circuits are in precharge condition
...
The enable signal En of R1 is raised, effectively latching the input
data into the register, assuming a positive edge-triggered or a level-sensitive implementation
...
The second C-element is triggered as well,
since Ackint is low
...
At its completion, the output data is placed on the bus, and a request is initiated to the second stage
(Reqint↑), which acknowledges its acceptance by raising Ackint
...
However, the input
buffer can respond to the Acki↑ event by resetting Reqi to its zero state (Reqi↓)
...
Upon receival of Ackint↑, Start goes low, the pre-charge phase starts, and

chapter10_141
...
Note that this sequence corresponds to the four-phase handshake
mechanism described earlier
...
45
...
Computer tools are often used to derive STGs that ensure proper
operation and optimize the performance
...
45 State transition diagram for pipeline stage 1
...
Arrows in dashed lines express actions in either the preceding or
following stage
...
4
...
Unfortunately, the overhead circuits preclude widespread application in
general purpose digital computing
...
A few examples that illustrate the use of self-timed
concepts for either power savings or performance enhancement are presented below
...
Imbalances in a logic network cause inputs of a logic gate or block to arrive at different times,
resulting in glitching transitions
...
If a logic
block can be enabled after all of the inputs settle, then the number of glitching transitions
can be reduced
...
Tri-state buffers are inserted between each of these
phases to prevent glitches from propagating further in the datapath (Figure 10
...
Assuming an arbitrary logic network in Figure 10
...
When the tri-state buffers at the output of logic block 1 are enabled, the the
computation of logic block 2 is allowed to proceed
...
The control of the tri-state buffer can be performed through the use of a selftimed enable signal which is generated by passing the system clock through a delay chain

chapter10_141
...
4

Self-Timed Circuit Design*

80

that models the critical path of the processor
...
This technique succeeds in reducing the switched capacitance combinational logic blocks such as multipliers, even including the overhead of the delay chain,
gating signal distribution and buffers [Goodman98]
...


in2

Logic

out2

in3

Block 2

Block 1

CLKD1

Logic
Block 3

CLKD2


...
46 Application of self-timing for glitch reduction
...
This structure uses a different control structure than the conventional self-timed pipeline described earlier
...
Since the precharge happens after operation instead of before evaluation, it is often termed post-charge logic
...
47 [Williams00]
...
47 Self-resetting logic
...
It is possible to precharge a
block based on the completion of its own output, but care must be taken to ensure that the
following stage has properly evaluated based on the output before it is precharged
...
fm Page 81 Tuesday, April 16, 2002 9:12 AM

81

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

It should be noted that unlike other logic
VDD
styles, the signals are represented as pulses, and are
valid only for a limited duration
...
While this logic
style offers potential speed advantages, special care
Figure 10
...
Also, cir3-input OR
...
An example of self-resetting logic is
shown in Figure 10
...
Assume that all inputs are low, and int is initially
precharged
...
This causes, the gate to
precharge
...

Clock-Delayed Domino
One interesting application of self-timed circuits using the delay-matching concept is
Clock-Delayed Domino
...
Instead, the clock for one stage is derived from the previous stage
...
49
...
e
...
Sometimes, there is
a transmission gate inserted between the two inverters to accomplish this (the transmission
gate is on with the gate terminal of NMOS tied to VDD and the PMOS to GND)
...
There are several advantages of using such a self-clocked timing
abstraction
...
This
alleviates a major limitation of conventional Domino that is only capable of non-inverting
logic
...
Also, notice that is possible to eliminate the “foot-switch” in the later stages, as the clock-evaluation edge arrives only when
the input is stable
...
A careful
analysis of the timing shows that the short circuit power can be eliminated
...
49 Clock Delayed Domino Logic
...
fm Page 82 Tuesday, April 16, 2002 9:12 AM

Section 10
...
5

Synchronizers and Arbiters*

82

Synchronizers and Arbiters*
10
...
1

Synchronizers—Concept and Implementation

Even though a complete system may be designed in a synchronous fashion, it must still
communicate with the outside world, which is generally asynchronous
...
50
...
50 Asynchronous-synchronous interface

Consider a typical personal computer
...
This reference determines
what happens within the computer system at any point in time
...
The way a synchronous system deals with such an asynchronous signal is to sample
or poll it at regular intervals and to check its value
...
However, it might happen that the signal is polled in the middle of a transition
...
At that point, it is not clear
if the key was pressed or not
...
For instance, one function might decide that the key is
pushed and start a certain action, while another function might lean the other way and
issue a competing command
...
Therefore, the
undefined state must be resolved in one way or another before it is interpreted further
...
For
instance, it is either decided that the key is not yet pressed, which will be corrected in the
next poll of the keyboard, or it is concluded that the key is already pressed
...
A circuit that implements such a decision-making function is called a synchronizer
...
An asynchronous/synchronous interface is thus always prone
to errors called synchronization failures
...

Typically, this probability can be reduced in an exponential fashion by waiting longer before

chapter10_141
...
This is not too troublesome in the keyboard example, but in general,
waiting affects system performance and should therefore be avoided to a maximal extent
...
51
...
However, since the samCLK
pled signal is not synchronized to the clock sig- Figure 10
...

nal, there is a finite probability that the set-up
time or hold time of the latch is violated (the probability is a strong function of the transition frequencies of the input and the clock)
...

The sampled signal eventually evolves into a legal 0 or 1 even in the latter case, as the
latch has only two stable states
...
7

Flip-Flop Trajectories

Figure 10
...
The inverters are composed of minimum-size devices
...
0

Vout

Figure 10
...


1
...
0

0

100

200

300

time, ps

If the input is sampled such that cross-coupled inverter starts at the metastable point,
the voltage will remain at the metastable state forever in the absense of noise
...
The time it takes to reach the acceptable signal zones
depends upon the initial distance of the sampled signal from the metastable point
...
This model is used to compute
the range of values for v(0) that still cause an error, or a voltage in the undefined range,
after a waiting period T
...


chapter10_141
...
5

Synchronizers and Arbiters*

84

V MS – ( V MS – V IL )e – T / τ ≤ v ( 0 ) ≤ V MS + ( V IH – V MS )e – T / τ

(10
...
(10
...
Increasing the
waiting period from 2τ to 4τ decreases the interval and the chances of an error by a factor
of 7
...

Some information about the asynchronous signal is required in order to compute the
probability of an error
...
Assume also that the
slopes of the waveform in the undefined region can be approximated by a linear function
(Figure 10
...
Using this model, we can estimate the probability Pinit that v(0), the value
of Vin at the sampling time, resides in the undefined region
...
9)

The chances for a synchronization error to occur depend upon the frequency of the
synchronizing clock φ
...
This means that the average number of synchronization errors per second Nsync(0)
equals Eq
...
10) if no synchronizer is used
...
10)

where Tφ is the sampling period
...
(10
...

P i ni t e –T / τ
( V IH – V IL )e – T / τ
tr
N sync ( T ) = --------------------- = -------------------------------------- -------------------Tφ
V swing
Tsignal T φ

(10
...

Example 10
...
Tφ = 5 nsec, which corresponds to a 200 Mhz clock), T
= Tφ = 5 nsec, Tsignal = 50 nsec, tr =0
...
7)
...
53
signal slope
...
fm Page 85 Tuesday, April 16, 2002 9:12 AM

85

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

From the VTC of a typical CMOS inverter, it can be derived that VIH – VIL approximately
equals 0
...
5 V
...
(10
...
38x 10–9 errors/sec
...
If no synchronizer was used, the MTF would only have been 2
...

• The exponential relation in Eq
...
11) makes the failure rate extremely sensitive to the
value of τ
...
τ varies from chip to
chip and is a function of temperature as well
...
A worst-case design scenario is
definitely advocated here
...
A problem occurs when T exceeds the sampling
period Tφ
...
54
...

Notice that this arrangement requires the φ-pulse to be short enough to avoid race conditions
...
The increase in MTF comes at the expense of an increased latency
...
54

O2
Sync

Out
Sync

Cascading synchronizers reduces the main time-to-failure
...
Making
the mean time-to-failure very large does not preclude errors
...
A maximum of one or two per
system is advocated
...
5
...
An arbiter is an element that decides which of
two events has occurred first
...
A synchronizer is actually a

chapter10_141
...
6

Clock Synthesis and Synchronization Using a Phase-Locked Loop

86

special case of an arbiter, since it determines if a signal transition happened before or
after a clock event
...

An example of a mutual-exclusion circuit is shown in Figure 10
...
It operates on
two input-request signals, that operate on a four-phase signaling protocol; that is, the
Req(uest) signal has to go back to the reset state before a new Req(uest) can be issued
...
While
Requests may occur concurrently, only one of the Acknowledges is allowed to go high
...
An event on one of the
inputs (e
...
, Req1↑) causes the flip-flop to switch, node A goes low, and Ack1↑
...
The cross-coupled output structure keeps the output
values low until one of the NAND outputs differs from the other by more than a threshold
value VT
...

Req1

Ack1
Req1
Arbiter

Req2

A

Ack2

Ack1

B

Ack2

Req2

(a) Schematic symbol

Req1
(b) Implementation
Req2
VT gap

A
B
Metastable
Ack1

t
(c) Timing diagram

Figure 10
...
6

Mutual-exclusion element (or arbiter)
...
Synchronous circuits need a global periodic clock reference to drive sequential elements
...
Crystal oscillators generate accurate, low-jitter clocks
with a frequency range from 10’s of Megahertz to approximately 200MHz
...
A PLL takes an external low-frequency reference crystal frequency signal and
multiplies its frequency by a rational number N (see the left side of Figure 10
...


chapter10_141
...

Typically as shown in Figure 10
...
Since chip-to-chip communication occurs at a lower rate than the on-chip clock
rate, the reference clock is a divided but in-phase version of the system clock
...
Introducing clock buffers to deal with this problem unfortunately introduces skew between the data and sample clock
...
e
...
In addition, for
the configuration shown in Figure 10
...

Chip 1
Chip 2
Data

Digital
System

Digital
System
fsystem = N * fcrystal

Divider

reference
clock

PLL

PLL

Clock
Buffer

fcrystal < 200Mhz
Crystal
Oscillator
Figure 10
...


10
...
1

Basic Concept

Periodic signals of known frequency can be discribed exactly by only one parameter, their
phase
...
57
...
The relative phase is
defined as the difference between the two phases
...
57Relative and absolute phase of two periodic signals

chapter10_141
...
6

Clock Synthesis and Synchronization Using a Phase-Locked Loop

88

A PLL is a complex, nonlinear feedback circuit, and its basic operation is understood with the aid of Figure 10
...
The voltage-controlled oscillator (VCO)
takes an analog control input and generates a clock signal of the desired frequency
...
To synthesize a system clock of a particular frequency, it necessary to set the
control voltage to the appropriate value
...
e
...
The feedback loop is critical to tracking process and environmental variations
...

The reference clock is typically generated off-chip from an accurate crystal reference
...
e
...
The local clock and reference clock are compared using a phase detector that
compares the phase difference between the signals and produces an Up or Down signal
when the local clock lags or leads the reference signal
...
Next, the Up and Down
signals are fed into a charge pump, which translates the digital encoded control information into an analog voltage [Gardner80]
...
A Down signal, on the other hand, slows down the oscillator and eliminates
the phase lead of the local clock
...
The edge of the local clock jumps back and forth instantaneously and oscillates
around the targeted position
...
This is partially accomplished by the introduction of
the loop filter
...

Note that the PLL structure is a feedback structure and the addition of extra phase shifts,
as is done by a high-order filter, may result in instability
...
When in lock, the system clock is N-times the reference clock frequency
...
This is
especially true for the loop filter and VCO, where induced noise has a direct effect on the
resulting clock jitter
...
58

vcont

Composition of a phase-locked loop (PLL)
...
fm Page 89 Tuesday, April 16, 2002 9:12 AM

89

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

rails and the substrate
...
Analog circuits with a high supply rejection, such as
differential VCOs, are therefore desirable [Kim90]
...
Below a more detailed description of various components of a PLL is given
...
6
...
In other words a VCO is characterized by, ω = ω 0 + K vco ⋅ v cont
...
12)

v cont dt

where K vco is the gain of the VCO given in rad/s/V, and ω 0 is a fixed frequency offset
...
The output signal has the form





Phase Detectors

t
–∞




x ( t ) = A ⋅ cos ω 0 t + K vco ⋅

v cont dt

(10
...
This output signal is then
used to adjust the output of the VCO and thus align the two inputs via a feedback network
...
Two basic types of phase
detectors are commonly used
...

XOR Phase Detector
...
The XOR
is useful as a phase detector since the time when the two inputs are different (or same) represents the relative phase
...
59 shows the XOR of two waveforms
...
fm Page 90 Tuesday, April 16, 2002 9:12 AM

Section 10
...
The output (low
pass filtered) as a function of the phase error is also shown
...
59

-90

90

180

phase error (deg)

The XOR as a phase detector
...
e
...
Thus the linear phase range is only 180 degrees
...
e
...
A drawback of the XOR phase detector is that it may
lock to a multiple of the clock frequency
...
The filtered version of this signal will be identical to
that of the truly locked state and thus the VCO will operate at the nominal frequency
...
The phase-frequency detector (PFD) is the most commonly
used form of phase detector, and it solves several of the shortcomings of the detectors discussed above
...
Accordingly, it cannot lock to an incorrect

chapter10_141
...
The PFD takes two clock inputs and produces two outputs, UP
and DOWN as shown in Figure 10
...

Rst
Q

D

UP

UP=0
DN=1

A
D

A

B
B
UP=0
DN=0

Rst
Q
A

DN
B
(a) schematic

UP = 1
DN = 0

A

B

(b) state transition diagram

A

A

B

B

UP

UP

DN

DN
(c) Timing waveforms

Figure 10
...
(a) Schematic (b) State Transition Diagram (c) Timing
...
Assume that both UP and DN outputs are
initially low
...
The UP signal remain in this state until a low-to-high transition occurs on input B
...
Notice that a short pulse proportional to the phase error is generated on
the DN signal, and that there is a small pulse on the DN output, whose duration is is equal
to the delay throught he AND gate and register reset delay
...
The roles are reversed for the case
when input B lags A, and a pulse proportional to the phase error is generated on the DN
output
...

A
B

UP

Figure 10
...


DN

The circuit also acts as a frequency detector, providing a measure of the frequency
error (Figure 10
...
For the case when A is at a higher frequency than B, the PFD generates a lot more UP pulses (with the average proportional to the frequency difference),

chapter10_141
...
6

Clock Synthesis and Synchronization Using a Phase-Locked Loop

92

while the DN pulses average close to zero
...

The phase characterisctics of the phase detector is shown in Figure 10
...
Notice
that the linear range has been expanded to 4π
...
62 Phase detector characteristic
...
63 Charge Pump
...
One possible implementation is
shown in Figure 10
...
A pulse on the UP signal adds a
charge packet proportional to the size of the UP pulse, and
a pulse on the DN signal removes a charge packet proportional to the DN pulse
...
This effectively increases the frequency of the
VCO
...
The period of
the startup transient is strongly dependent on the bandwidth of the loop filter
...
64
shows a spice level simulation of a PLL (an ideal VCO is used to speed up the simulation)
implemented in 0
...
In this example a reference frequency of 100Mhz is chosen and the PLL multiplies this frequency by 8 to 800Mhz
...
Once the control
voltage reaches its final value, the output frequency (the clock to the digital system) settles
to its final value
...
As illustrated in this plot, the top graph show lock in
which fout = 8 * fref
...


chapter10_141
...
0
div

Control Voltage (V)

0
...
6

0
...
2
div

0
...
64 Spice simulation of a PLL
...


Summary
In a short span of time, phased-locked loops have become an essential component of
any high-performance digital design
...
Yet, experience has demonstrated that
this combination is perfectly feasible, and leads to new and better solutions
...
7

Future Directions
This section highlights some of the trends in high-performance and low-power timing
optimization
...
7
...
A schematic of a DLL is shown in Figure 10
...
The key component of a DLL is a voltage-controlled delay line
(VCDL)
...
The idea is to delay the output clock such that it perfectly lines up with
the reference
...
The reference frequency is fed
into the input of the VCDL
...

Note that only a phase detection is required instead of a PFD
...
fm Page 94 Tuesday, April 16, 2002 9:12 AM

Section 10
...
The function of the feedback is to adjust the delay through
the VCDL such that the rising edge of the input reference clock (fREF), and the output
clock fO are aligned
...
65 Delay-Locked Loop
...
66c
...
Since the first edge of the output arrives before the reference edge,
an UP pulse of width equal to the error between the two signals
...
This causes the edge of the output signal to be delayed in the next
cycle (this implementation of the VCDL assumes that a large voltage results in larger
delay)
...

Figure 10
...
The
chip is partitioned into many small regions (or tiles)
...
g
...
For purpose of simplicity, the Figure shows a two-tile chip, but
this is easily extended to many regions
...
In front of each buffer is a VCDL
...
Unfortunately,
the static and dynamic variations of the buffers cause the phase error between the buffered
clocks to be non-zero and time-varying
...
The feedback inside each tile
adjusts the control voltage of VCDL, such that the buffered output is locked in phase to
the global input clock
...
g
...
Such configurations have become
common in high performance digital microprocessors
...
fm Page 95 Tuesday, April 16, 2002 9:12 AM

95

TIMING ISSUES IN DIGITAL CIRCUITS

VCDL

Chapter 10

Digital
Circuit

CP/LF

Phase
Detector

GLOBAL CLK

VCDL

Digital
Circuit

CP/LF

Phase
Detector

Figure 10
...


10
...
2

Optical Clock Distribution

It is clear that there are some fundamental problems associated with electrical synchronization techniques for future high-performance multi-GHz systems
...
Even with aggressive active clock management schemes such as the use of DLLs and PLLs, the variations
in power supply and clock load result in unacceptable clock uncertainity
...

An excellent review of the rationale and trade-offs in optical interconnects vs
...

The potential advantage of optical technology for clock distribution is due to the fact
that the delay is not sensitive to temperature and the clock edges don’t degrade over long
distances (e
...
, 10’s meters)
...
Of course, the performance of an optical system
is limited by the speed of light
...
Figure 10
...

The off-chip optical source is brought to the chip, distributed through waveguides, and
converted through receiver circuitry to a local electrical clock distribution network
...
Notice that an H-treee is used
in distributing the optical clock
...
Once reaching the detector in each section,
the global clock optical pulses are converted into current pulses
...
fm Page 96 Tuesday, April 16, 2002 9:12 AM

Section 10
...
into voltage signals
...
Optics has the additional advantage that many of the
difficulties with electromagnetic wave phenomena are avoided (e
...
, crosstalk or inductive
coupling)
...
Optical clocks avoid this probelm and don’t require termination [Miller00]
...
There are some variations in the arrival time of the optical signal (e
...
, due to variations at bends cause different energy loss along different paths)
...
The sources of variations are very
similar to a conventional electrical approach including threshold and device variations,
power supply and temperature variations, and variations in the local drivers
...
67
distribution
...
However,
the challenges of dealing with process variations in the opto-electronic circuitry must be
addressed for this to become a reality
...
8

Perspective: Synchronous versus Asynchronous Design
The self-timed approach offers a potential solution to the growing clock-distribution problem
...

Independence from physical timing constraints is achieved with the aid of completion signals
...
This requires adherence to a certain protocol, which normally
consists of either two or four phases
...
Examples of self-timed circuits can be found in signal processing [Jacobs90], fast
arithmetic units (such as self-timed dividers) [Williams87], simple microprocessors
[Martin89] and memory (static RAM, FIFOs)
...
The design a fool-proof network of handshaking units, that is robust with respect
to races, live-lock, and dead-lock, is nontrivial and requires the availability of dedicated
design-automation tools
...

This was amply illustrated by the example of the 21164 Alpha microprocessor
...
fm Page 97 Tuesday, April 16, 2002 9:12 AM

97

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

agement requires extensive modeling and analysis, as well as careful design
...
This observation is
already reflected in the fact that the routing network for the latest generation of massively
parallel supercomputers is completely implemented using self-timing [Seitz92]
...
Other alternative timing approaches might emerge as well
...

10
...

• An in-depth analysis of the synchronous digital circuits and clocking approaches
was presented
...
Important parameters are the clocking scheme used and
the nature of the clock-generation and distribution network
...
Self-timed design uses completion signals
and handshaking logic to isolate physical timing constraints from event ordering
...
The introduction of synchronizers helps to reduce that risk,
but can never eliminate it
...
They are used to generate high speed clock signals on a chip
...

• Important trends for clock distribution include the use of delay-locked loops to
actively adjust delays on a chip
...


10
...
One of the best discussions so far is the chapter by Chuck Seitz in [Mead80, Chapter
7]
...
A collection of papers on clock distribution
networks is presented in [Friedman95]
...


chapter10_141
...
10

To Probe Further

98

REFERENCES
[Abnous93] A
...
Behzad, “A High-Performance Self-Timed CMOS Adder,” in EE241
Final Class Project Reports, by J
...
of California—Berkeley, May 1993
...
Bailey, “Clock Distribution,” in [Chandrakasan00]
...
Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley,
pp
...

[Boning00] D
...
Nassif, “Models of Process Variations in Device and Interconnect” in
[Chandrakasan00]
...
Bowhill et al
...

[Bernstein98] K
...
al, High Speed CMOS Design Styles, Kluwer Academic Publishers,
1998
...
Bernstein, “Basic Logic Families”, in [Chandrakasan00]
...
Chandrakasan, W
...
Fox, Design of High-Performance Microprocessor Circuits, IEEE Press, 2000
...
Chaney and F
...
on Computers, vol
...
421–422
...
Dally and J
...

[Dopperpuhl92] D
...
, “A 200 MHz 64-b Dual Issue CMOS Microprocessor,” IEEE
Journal on Solid State Circuits, vol
...
11, Nov
...
1555–1567
...
Friedman, ed
...

[Gardner80] F
...
on Communications,
vol
...
1849–1858
...
Glasser and D
...
360–365
...
Goodman, A
...
Dancy, A
...
Chandrakasan, "An Energy/Security Scalable
Encryption Processor Using an Embedded Variable Voltage DC/DC Converter," IEEE Journal of
Solid-state Circuits, pp
...

[Gray93] P
...
Meyer, Analysis and Design of Analog Integrated Circuits, 3rd ed
...

[Hatamian88] M
...
S
...
, Plenum Publishing, pp
...

[Heller84] L
...
, “Cascade Voltage Switch Logic: A Differential CMOS Logic Family,”
IEEE International Solid State Conference Digest, Feb
...
16–17
...
Jacobs and R
...
25, No 6, December 1990, pp
...

[Jeong87] D
...
, “Design of PLL-Based Clock Generation Circuits”, IEEE Journal on
Solid State Circuits, vol
...

[Johnson93] H
...
Graham, High-Speed Digital Design—A Handbook of Black Magic,
Prentice-Hall, N
...

[Kim90] B
...
Helman, and P
...
SC-25, no
...
1385–1394
...
fm Page 99 Tuesday, April 16, 2002 9:12 AM

99

TIMING ISSUES IN DIGITAL CIRCUITS

Chapter 10

[Martin89] A
...
al, “The First Asynchronous Microprocessor: Test Results,” Computer
Architecture News, vol
...
95–110
...
Mead and L
...

[Messerschmitt90] D
...
8, No
...
1404-1419, October 1990
...
Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips,”
Proceedings of the IEEE, pp
...

[Nielsen94] L Nielsen, C
...
Sparso, K
...

[Restle98] P
...
Jenkins, A
...
Cook, “Measurement and Modeling of On-chip
Transmission Line Effects in a 400MHz Microprocessor,” IEEE Journal of Solid-state Circuits,
pp
...

[Seitz80] C
...
218–262, 1980
...
Seitz, “Mosaic C: An Experimental Fine-Grain Multicomputer,” in Future Tendencies
in Computer Science, Control and Applied Mathematics, Proceedings International Conference
on the 25th Anniversary of INRIA, Springer-Verlag, Germany, pp
...

[Shoji88] M
...

[Sutherland89] I
...
720–738, June
1989
...
Veendrick, “The Behavior of Flip Flops Used as Synchronizers and Prediction of
Their Failure Rates”, IEEE Journal of Solid State Circuits, vol
...

169–176
...
Williams et al
...
of Advanced Research
in VLSI 1987, Stanford Conf
...
75–96, March 1987
...
eecs
...
edu/IcBook) for
insightful and challenging exercises and design problems
Title: VLSI
Description: this is a full book for vlsi will be very usefull author:RABEAY