Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: VLSI
Description: this is a full book for vlsi will be very usefull author:RABEAY
Description: this is a full book for vlsi will be very usefull author:RABEAY
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
DIGITAL INTEGRATED CIRCUITS
A DESIGN PERSPECTIVE
2ND EDITION
Jan M
...
1 A Historical Perspective
1
...
3 Quality Metrics of a Digital Design
1
...
1
Cost of an Integrated Circuit
1
...
2
Functionality and Robustness
1
...
3
Performance
1
...
4
Power and Energy Consumption
1
...
fm Page 43 Monday, September 6, 1999 1:50 PM
CHAPTER
3
THE DEVICES
Qualitative understanding of MOS devices
n
Simple component models for manual analysis
n
Detailed component models for SPICE
n
Impact of process variations
3
...
3
...
2
The Diode
3
...
4
The Actual MOS Transistor— Some
Secondary Effects
3
...
5
SPICE Models for the MOS Transistor
3
...
1
3
...
2
Static Behavior
3
...
2
...
5
Perspective: Technology Scaling
3
...
4
The Actual Diode— Secondary Effects
3
...
5
3
...
3
...
3
...
fm Page 44 Monday, September 6, 1999 1:50 PM
44
THE DEVICES
Chapter 3
3
...
This surely holds for digital circuit design as well
...
The role of the
semiconductor devices has been appreciated for a long time in the world of digital integrated circuits
...
Giving the reader the necessary knowledge and understanding of these components
is the prime motivation for the next two chapters
...
We refer
the reader to the many excellent textbooks on semiconductor devices for that purpose,
some of which are referenced in the To Probe Further section at the end of the chapters
...
Another important function of this chapter is the introduction of
models
...
Such an
approach is similar to considering the molecular structure of concrete when constructing a
bridge
...
A range of models can be conceived for each component presenting a
trade-off between accuracy and complexity
...
It has limited accuracy but helps us to understand the operation of the circuit
and its dominant parameters
...
In this
chapter, we present both first-order models for manual analysis as well as higher-order
models for simulation for each component of interest
...
They should be aware, however, that these are only nominal values, and that the actual
parameter values vary with operating temperature, over manufacturing runs, or even over
a single wafer
...
3
...
Each MOS transistor implicitly contains a number of
reverse-biased diodes that directly influence the behavior of the device
...
Diodes are also used to protect the
input devices of an IC against static charges
...
Rather than being comprehensive,
chapter3
...
2
The Diode
45
we choose to focus on those aspects that prove to be influential in the design of digital
1
MOS circuits, this is the operation in reverse-biased mode
...
2
...
Figure 3
...
It consists of two homogeneous regions of and nptype material, separated by a region of transition from one type of doping to another,
which is assumed thin
...
The p-type material is doped with acceptor impurities (such as boron), which results in the presence of
holes as the dominant or majority carriers
...
Aluminum contacts provide access to thep- and n-terminals of the
device
...
1c
...
1b)
...
The electron concentration
changes from a high value in the n-type material to a very small value in the p-type
material
...
This gradient causes electrons to
B
A
Al
SiO2
p
n
(a) Cross-section of pn-junction in an IC process
A
Al
A
p
n
B
B
(b) One-dimensional
representation
1
Figure 3
...
(c) Diode symbol
We refer the interested reader to the web-site of the textbook for a comprehensive description of the
diode operation
...
fm Page 46 Monday, September 6, 1999 1:50 PM
46
THE DEVICES
Chapter 3
diffuse from n to p and holes to diffuse fromp to n
...
Consequently, the p-type material is negatively charged in the vicinity of thepn-boundary
...
The region at the junction, where the
majority carriers have been removed, leaving the fixed acceptor and donor ions, is called
the depletion or space-charge region
...
This field counteracts the diffusion of holes
and electrons, as it causes electrons to drift from p to n and holes to drift from n to p
...
The above analysis is summarized in Figure 3
...
In the device shown, thep material is more heavily doped than
Hole diffusion
Electron diffusion
(a) Current flow
p
n
Hole drift
Electron drift
ρ
Charge
density
+
x
(b) Charge density
Distance
-
Electrical
field
E
x
V
Potential
x
-W1
Figure 3
...
φ0
(d) Electrostatic
potential
chapter3
...
2
The Diode
47
the n, or NA > ND, with NA and ND the acceptor and donor concentrations, respectively
...
Figure 3
...
This potential has the value
NA ND
φ 0 = φ T ln -------------n i2
(3
...
2)
where φT is the thermal voltage
The quantity ni is the intrinsic carrier concentration in a pure sample of the semiconductor
and equals approximately 1
...
Example 3
...
Calculate the built-in potential at 300 K
...
(3
...
25 × 10 20
3
...
2
Static Behavior
The Ideal Diode Equation
Assume now that a forward voltage VD is applied to the junction or, in other words, that
the potential of the p-region is raised with respect to then-zone
...
Consequently, the flow of mobile carriers across the junction
increases as the diffusion current dominates the drift component
...
3 Minority carrier concentrations in the neutral region near an abruptpn-junction under forward-bias
conditions
...
fm Page 48 Monday, September 6, 1999 1:50 PM
48
THE DEVICES
Chapter 3
the depletion region and are injected into the neutraln- and p-regions, where they become
minority carriers, as is illustrated in Figure 3
...
Under the assumption that no voltage gradient exists over the neutral regions, which is approximately the case for most modern
devices, these minority carriers will diffuse through the region as a result of the concentration gradient until they get recombined with a majority carrier
...
On the other hand, when a reverse voltage VD is applied to the junction or when the
potential of the p-region is lowered with respect to the n-region, the potential barrier is
raised
...
A current flows from then-region to the p-region
...
4)
...
The diode
thus acts as a one-way conductor
...
4 Minority carrier concentration in the neutral regions near the pn-junction under reverse-bias
conditions
...
This is illustrated in Figure 3
...
The exponential behavior for positive-bias voltages is
even more apparent in Figure 3
...
The
current increases by a factor of 10 for every extra 60 mV (= 2
...
At
small voltage levels (VD < 0
...
The behavior of the diode for both forward- and reverse bias conditions is best
described by the well-known ideal diode equation, which relates the current through the
diode ID to the diode bias voltage VD
I D = I S ( e VD ⁄ φ T – 1 )
(3
...
(3
...
5
...
(3
...
IS represents a constant value, called thesaturation current of the diode
...
fm Page 49 Monday, September 6, 1999 1:50 PM
Section 3
...
5
10–5
Deviation due to
recombination
I D (A)
I D (mA)
2
...
5
–0
...
0
–0
...
0
VD (V)
(a) On a linear scale
Figure 3
...
3 φT V / decade
current
10–10
0
...
0
10–15
0
...
2
0
...
6
0
...
regions
...
It is worth mentioning that in actual
devices, the reverse currents are substantially larger than the saturation currentIS
...
The electric field present sweeps these carriers out of the region, causing an additional current
component
...
Actual device measurements are, therefore, necessary to determine realistic
values for the reverse diode leakage currents
...
A first model, shown in Figure 3
...
(3
...
While this model yields accurate results, it has
the disadvantage of being strongly nonlinear
...
An often-used, simplified model is derived by
inspecting the diode current plot of Figure 3
...
For a “fully conducting” diode, the voltage
drop over the diode VD lies in a narrow range, approximately between 0
...
8 V
...
6
–
Diode models
...
fm Page 50 Monday, September 6, 1999 1:50 PM
50
THE DEVICES
Chapter 3
first degree, it is reasonable to assume that a conducting diode has a fixed voltage drop
VDon over it
...
7 V is typically
assumed
...
6b, where a conducting diode is replaced
by a fixed voltage source
...
2 Analysis of Diode Network
Consider the simple network of Figure 3
...
5
× 10–16 A
...
224 mA, and VD =
0
...
The simplified model with VDon = 0
...
7 V, ID =
0
...
It hence makes considerable sense to use this model when determining a first-order solution of a diode network
...
7
3
...
3
A simple diode circuit
...
Just as important in the design of digital circuits is the response of the device to
changes in its bias conditions
...
Because the operation mode of the diode
is a function of the amount of charge present in both the neutral and the space-charge
regions, its dynamic behavior is strongly determined by how fast charge can be moved
around
...
In fact, all diodes in an
operational MOS digital integrated circuit are reverse-biased and are supposed to remain
so under all circumstances
...
A signal over(under) shooting the supply rail is an example of such
...
Hence, we will devote our attention solely to what governs the dynamic response of
the diode under reverse-biasing conditions, the depletion-region charge
...
fm Page 51 Monday, September 6, 1999 1:50 PM
Section 3
...
The corresponding charge distribution
under zero-bias conditions was plotted in Figure 3
...
This picture can be easily extended
to incorporate the effects of biasing
...
This corresponds to a reduced depletion-region width
...
These observations are confirmed by the well- known depletion-region
expressions given below (a derivation of these expressions, which are valid for abrupt
junctions, is either simple or can be found in any textbook on devices such as [Howe97])
...
1
...
NA N D
Q j = A D 2ε si q -------------------- ( φ 0 – V D )
NA + N D
(3
...
Depletion-region width
...
5)
3
...
Ej =
N N
2q -------------------- ( φ – V )
------ A D
D
ε si N A + N D 0
(3
...
7
times the permittivity of a vacuum, or 1
...
The ratio of the n- versus p-side
of the depletion-region width is determined by the doping-level ratios: 2/(−W1) = NA/ND
...
Because the space-charge region
contains few mobile carriers, it acts as an insulator with a dielectric constantεsi of the
semiconductor material
...
A small change
in the voltage applied to the junctiondVD causes a change in the space charge dQj
...
7)
where Cj0 is the capacitance under zero-bias conditions and is only a function of the physical parameters of the device
...
fm Page 52 Monday, September 6, 1999 1:50 PM
52
THE DEVICES
Chapter 3
2
...
5
Cj (fF)
1
...
0
Linear junction
m = 0
...
5
0
...
0
Cj0
-2
...
0
VD (V)
Figure 3
...
µ
ε si q N A N D
–
C j0 = A D --------- -------------------- φ 0 1
2 N A + N D
(3
...
(3
...
Typically, the AD factor is
omitted, and Cj and Cj0 are expressed as a capacitance/unit area
...
8 for a typical silicon diode found in MOS circuits
...
Note also that the capacitance decreases with an increasing reverse
bias: a reverse bias of 5 V reduces the capacitance by more than a factor of two
...
3 Junction Capacitance
Consider the following silicon junction diode:Cj0 = 2 × 10-3 F/m2, AD = 0
...
64 V
...
5 V results in a junction capacitance of 0
...
9
fF/µm2), or, for the total diode, a capacitance of 0
...
Equation (3
...
This is often not the
case in actual integrated-circuit pn-junctions, where the transition fromn to p material can
be gradual
...
An analysis of the
linearly-graded junction shows that the junction capacitance equation of Eq
...
7) still
holds, but with a variation in order of the denominator
...
fm Page 53 Monday, September 6, 1999 1:50 PM
Section 3
...
9)
where m is called the grading coefficient and equals 1/2 for the abrupt junction and 1/3 for
the linear or graded junction
...
8
...
8 raises awareness to the fact that the junction capacitance is a voltage-dependent
parameter whose value varies widely between bias points
...
Under those circumstances, it is more
attractive to replace the voltage-dependent, nonlinear capacitanceCj by an equivalent, linear capacitance Ceq
...
10)
Combining Eq
...
4) (extended to accommodate the grading coefficient and Eq
...
10) yields the value of Keq
...
11)
Example 3
...
3 is switched between 0 and−2
...
Compute the average junction
capacitance (m = 0
...
For the defined voltage range and forφ0 = 0
...
622
...
24 fF/µm2
...
2
...
Not
all applied bias voltage appears directly across the junction, as there is always some voltage drop over the neutral regions
...
This effect can be modeled by adding a resistor in series with the n- and p-region diode contacts
...
When the reverse bias
exceeds a certain level, called the breakdown voltage, the reverse current shows a dramatic increase as shown in Figure 3
...
this increase is caused by the avalanche breakdown
...
Consequently, carriers crossing
the depletion region are accelerated to high velocity
...
fm Page 54 Monday, September 6, 1999 1:50 PM
54
THE DEVICES
Chapter 3
ID (A)
0
...
1
–25
...
0
–5
...
0
Figure 3
...
VD (V)
reach a high -enough energy level that electron-hole pairs are created on collision with
immobile silicon atoms
...
The value ofEcri t is approximately 2 ×10 5 V/cm for impurity concentrations of the order of 1016 cm−3
...
Observe thatavalanche
breakdown is not the only breakdown mechanism encountered in diodes
...
Discussion of this phenomenon is beyond the scope of this text
...
The thermal voltage φT, which appears in the exponent of the current equation, is
linearly dependent upon the temperature
...
2
...
Theoretically, the saturation current approximately doubles every 5 °C
...
This dual dependence has a significant impact on the operation of a digital circuit
...
For instance,
for a forward bias of 0
...
Secondly, integrated circuits rely heavily on reverse-biased diodes as
isolators
...
3
...
5
The SPICE Diode Model
In the preceding sections, we have presented a model for manual analysis of a diode circuit
...
fm Page 55 Monday, September 6, 1999 1:50 PM
Section 3
...
10
SPICE diode model
...
While different circuit simulators have
been developed over the last decades, the SPICE program, developed at the University of
California at Berkeley, is definitely the most successful [Nagel75]
...
The accuracy of the simulation
depends directly upon the quality of this model
...
Creating accurate and computation-efficient SPICE models has been a long process and is by no means finished
...
The standard SPICE model for a diode is simple, as shown in Figure 3
...
The
steady-state characteristic of the diode is modeled by the nonlinear current source D,
I
which is a modified version of the ideal diode equation
I D = I S ( e V D ⁄ nφT – 1 )
(3
...
It equals 1 for most common diodes but can be somewhat higher than 1 for others
...
For
higher current levels, this resistance causes the internal diodeVD to differ from the externally applied voltage, hence causing the current to be lower than what would be expected
from the ideal diode equation
...
Only the former was discussed
in this chapter, as the latter is only an issue under forward-biasing conditions
...
13)
A listing of the parameters used in the diode model is given in Table 3
...
Besides the
parameter name, symbol, and SPICE name, the table contains also the default value used
by SPICE in case the parameter is left undefined
...
Other parameters are available to govern second-order effects such as break-
chapter3
...
To be concise, we chose to limit the listing to the
parameters of direct interest to this text
...
g
...
Table 3
...
Parameter Name
Symbol
SPICE Name
Units
Default Value
Saturation current
IS
IS
A
1
...
5
Junction potential
φ0
VJ
V
1
3
...
Its major asset from a digital perspective is that the device performs very well as a switch, and introduces little parasitic
effects
...
Following the approach we took for the diode, we restrict ourselves in this section to
a general overview of the transistor and its parameters
...
The discussion concludes with an enumeration of
some second-order effects and the introduction of the SPICE MOS transistor models
...
3
...
The voltage applied to thegate terminal determines if and how much current flows between thesource and the drain ports
...
Its function is secondary as it only serves to
modulate the device characteristics and parameters
...
When a
voltage is applied to the gate that is larger than a given value called the
threshold voltage
VT, a conducting channel is formed between drain and source
...
The conductivity of the
channel is modulated by the gate voltage— the larger the voltage difference between gate
and source, the smaller the resistance of the conducting channel and the larger the current
...
fm Page 57 Monday, September 6, 1999 1:50 PM
Section 3
...
Two types of MOSFET devices can be identified
...
The current is carried by
electrons moving through an n-type channel between source and drain
...
MOS
devices can also be made by using ann-type substrate and p+ drain and source regions
...
The device
is called a p-channel MOS, or PMOS transistor
...
The cross-section of a contemporary dual-well CMOS
process was presented in Chapter 2, and is repeated here for convenience Figure 3
...
(
gate-oxide
TiSi 2
AlCu
SiO 2
Tungsten
poly
p-well
SiO 2
n-well
n+
p- epi
p+
p+
Figure 3
...
Circuit symbols for the various MOS transistors are shown in Figure 3
...
As mentioned earlier, the transistor is a four-port device with gate, source, drain, and body terminals (Figures a and c)
...
If the fourth terminal is not shown, it is
assumed that the body is connected to the appropriate supply
...
12 Circuit symbols for
MOS transistors
...
fm Page 58 Monday, September 6, 1999 1:50 PM
58
THE DEVICES
3
...
2
Chapter 3
The MOS Transistor under Static Conditions
In the derivation of the static model of the MOS transistor, we concentrate on the NMOS
device
...
The Threshold Voltage
Consider first the case where VGS = 0 and drain, source, and bulk are connected to ground
...
Under the mentioned conditions, both junctions have a 0 V bias and can
be considered off, which results in an extremely high resistance between drain and source
...
13
...
The positive gate voltage causes positive charge to
accumulate on the gate electrode and negative charge on the substrate side
...
Hence, a depletion region is formed
below the gate
...
Consequently, similar expressions hold for the width and the space charge per unit
area
...
(3
...
(3
...
Wd =
2ε si φ
-----------qN A
(3
...
15)
and
Qd =
with NA the substrate doping and φ the voltage across the depletion layer (i
...
, the potential
at the oxide-silicon boundary)
...
This
+
S
VGS
–
D
G
n+
n+
n-channel
Depletion
region
p-substrate
B
Figure 3
...
chapter3
...
3
The MOS(FET) Transistor
59
point marks the onset of a phenomenon known asstrong inversion and occurs at a voltage
equal to twice the Fermi Potential (Eq
...
16)) (φF ≈ −0
...
16)
Further increases in the gate voltage produce no further changes in the depletionlayer width, but result in additional electrons in the thin inversion layer directly under the
oxide
...
n+
Hence, a continuous n-type channel is formed between the source and drain regions, the
conductivity of which is modulated by the gate-source voltage
...
17)
This picture changes somewhat in case a substrate bias voltageVSB is applied (VSB is normally positive for n-channel devices)
...
The charge stored in the depletion
region now is expressed by Eq
...
18)
QB =
2qN A ε si ( – 2 φ F + V SB )
(3
...
VT is a function of several components, most of which are material constants such as the
difference in work-function between gate and substrate material, the oxide thickness, the
Fermi voltage, the charge of impurities trapped at the surface between channel and gate
oxide, and the dosage of ions implanted for threshold adjustment
...
as well
...
The threshold voltage
under different body-biasing conditions can then be determined in the following manner,
V T = V T0 + γ (
– 2φ F + V SB –
– 2φ F )
(3
...
Observe that the threshold voltage has a positive value for a typical
NMOS device, while it is negative for a normal PMOS transistor
...
fm Page 60 Monday, September 6, 1999 1:50 PM
60
THE DEVICES
The effect of the well bias on the
threshold voltage of an NMOS
transistor is plotted in for typical
values of |–2φF| = 0
...
4 V0
...
A negative bias on the
well or substrate causes the
threshold to increase from 0
...
85 V
...
6 V
in an NMOS
...
0
...
85
0
...
75
0
...
7
0
...
55
0
...
45
0
...
5
-2
-1
...
14
-0
...
Example 3
...
4 V, while the body-effect coefficient
equals -0
...
Compute the threshold voltage forVSB = -2
...
2φF = 0
...
Using Eq
...
19), we obtain VT(-2
...
4 - 0
...
5+0
...
5 - 0
...
5) V = -0
...
The voltage difference causes a current ID to flow from drain to source (Figure
3
...
Using a simple analysis, a first-order expression of the current as a function of GS
V
and VDS can be obtained
...
15 NMOS transistor with bias
voltages
...
Under the assumption that this voltage exceeds the threshold
voltage all along the channel, the induced channel charge per unit area at point can be
x
computed
...
fm Page 61 Monday, September 6, 1999 1:50 PM
Section 3
...
20)
Cox stands for the capacitance per unit area presented by the gate oxide, and equals
ε ox
C ox = -----t ox
(3
...
97 × εo = 3
...
The latter which is 10 nm (= 100 Å) or smaller for contemporary processes
...
µm
The current is given as the product of the drift velocity of the carriers n and the
υ
available charge
...
W is the width of the channel in a direction perpendicular to the current flow
...
22)
The electron velocity is related to the electric field through a parameter called the
mobility
µn (expressed in m2/V⋅s)
...
In general, an empirical value is used
...
23)
Combining Eq
...
20) − Eq
...
23) yields
I D dx = µ n C ox W ( VGS – V – V T )dV
(3
...
V DS 2
V DS 2
W
- = k n ( V GS – V T )V DS – --------I D = k'n ---- ( V GS – V T )V DS – -------2
2
L
(3
...
26)
The product of the process transconductance k'n and the (W/L) ratio of an (NMOS) transistor is called the gain factor k n of the device
...
(3
...
The operation region where Eq
...
25) holds is hence called the resistive or linear
region
...
NOTICE: The W and L parameters in Eq
...
25) represent the effective channel width
and length of the transistor
...
In the remainder of the text, W and L will
(
chapter3
...
The following expressions related the two parameters, with∆W and ∆L
parameters of the manufacturing process:
W = Wd – ∆W
L = L d – ∆L
(3
...
This happens
when VGS − V(x) < VT
...
This is illustrated in Figure 3
...
16
–
VGS - VT
+
n+
NMOS transistor under pinch-off conditions
...
No channel exists in the vicinity of the drain region
...
(3
...
(3
...
The voltage difference over the induced channel (from the pinch-off point
to the source) remains fixed at VGS − VT, and consequently, the current remains constant
(or saturates)
...
(3
...
It is worth observing that, to a first agree, the current is no longer a function
of VDS
...
k' n W
I D = ----- ---- ( V GS – V T ) 2
- 2 L
(3
...
fm Page 63 Monday, September 6, 1999 1:50 PM
Section 3
...
This not entirely correct
...
As can be observed from Eq
...
29), the current increases when the length
factor L is decreased
...
(3
...
I D = I D ′ ( 1 + λV DS )
(3
...
Analytical expressions for λ have proven to be complex and
inaccurate
...
In shorter transistors,
the drain-junction depletion region presents a larger fraction of the channel, and the channel-modulation effect is more pronounced
...
Velocity Saturation
The behavior of transistors with very short channel lengths (called
short-channel devices)
deviates considerably from the resistive and saturated models, presented in the previous
paragraphs
...
Eq
...
23)
states that the velocity of the carriers is proportional to the electrical field, independent of
the value of that field
...
However, at high
field strengths, the carriers fail to follow this linear model
...
This is illustrated in Figure
3
...
υn (m/s)
υsat = 105
Constant velocity
Constant mobility (slope = µ)
Figure 3
...
5
Velocity-saturation effect
...
5
× 106 V/m (or 1
...
This means that in an NMOS device with a channel length of 1
µm, only a couple of volts
between drain and source are needed to reach the saturation point
...
Holes in an-type silicon saturate at the same velocity, although a higher electrical field is needed to achieve saturation
...
chapter3
...
We will illustrate this with a first-order derivation of the device characteristics under velocity-saturating conditions [Ko89]
...
17, can be roughly approximated by the following expression:
µn ξ
υ = -------------------- for ξ ≤ ξ c
1 + ξ ⁄ ξc
= υ sat
(3
...
Re-evaluξ
tion of Eq
...
20) and Eq
...
22) in light of revised velocity formula leads to a modified
expression of the drain current in the resistive region:
V DS 2
W
I D = κ(V DS)µ n C ox ---- ( V GS – V T )V DS – -------2
L
(3
...
33)
with
κ is a measure of the degree of velocity saturation, sinceVDS/L can be interpreted as the
average field in the channel
...
(3
...
For short-channel devices,κ is smaller than 1, which
means that the delivered current is smaller than what would be normally expected
...
The saturation drain voltage VDSAT can be calculated by equating the current at the drain to
the current given by Eq
...
32) for VDS = VDSAT
...
(3
...
υ
I DSAT = υ sat C ox W ( VGT – V DSAT )
V DSAT
W
= κ(V DSAT )µ n C ox ---- V GT V DSAT – -------------2
L
2
(3
...
After some algebra, we obtain
V DSAT = κ(V GT )V GT
(3
...
This leads to some interesting observations:
• For a short-channel device and for large enough values ofVGT, κ(VGT) is substantially smaller than 1, hence VDSAT < VGT
...
Short-channel devices therefore experience an extended saturation
chapter3
...
3
The MOS(FET) Transistor
65
ID
Long-channel device
VGS = VDD
Short-channel device
VDSAT
Figure 3
...
VGS - V T
VDS
region, and tend to operate more often in saturation conditions than their long-channel counterparts, as is illustrated in Figure 3
...
• The saturation current IDSAT displays a linear dependence with respect to the gatesource voltage VGS, which is in contrast with the squared dependence in the longchannel device
...
On the other hand, reducing the operating voltage does not
have such a significant effect in submicron devices as it would have in a long-channel transistor
...
From a modeling perspective, it appears as though the
effective channel is shortening with increasingVDS, similar in effect to the channel-length
modulation
...
Thus far we have only considered the effects of the tangential field along the channel due to the VDS, when considering velocity-saturation effects
...
This effect, which is calledmobility degradation, reduces the surface
mobility with respect to the bulk mobility
...
(3
...
36)
with µn0 the bulk mobility and η an empirical parameter
...
Readers interested in a more in-depth perspective on the short-channel effects in
MOS transistors are referred to the excellent reference works on this topic, such as
[Ko89]
...
(3
...
(3
...
A substantially simpler model can be obtained by making two assumptions:
chapter3
...
The velocity saturates abruptly at ξc, and is approximated by the following expression:
υ = µn ξ
= υ sat = µ n ξ c
for ξ ≤ ξ c
(3
...
The drain-source voltage VDSAT at which the critical electrical field is reached and
velocity saturation comes into play is constant and is approximated by Eq
...
38)
...
(3
...
Lυ sat
V DSAT = Lξ c = -----------µn
(3
...
OnceVDSAT is reached, the current abruptly saturates
...
(3
...
I DSAT = I D ( V DS = V DSAT )
2
V DSAT
W
= µ n C ox ---- ( V GS – V T )V DSAT – --------------
2
L
(3
...
The simplified velocity model causes
substantial deviations in the transition zone between linear and velocity-saturated regions
...
Most importantly, the equations are coherent with the familiar long-channel equations, and provide the digital designer with a much needed tool for intuitive understanding
and interpretation
...
Figure
3
...
One would hence expect both devices to display identical
IV characteristics, The main difference however is that the first device has a long channel
length (Ld = 10 µm), while the second transistor is a short channel device Ld = 0
...
Consider first the long-channel device
...
The transi-
chapter3
...
3
The MOS(FET) Transistor
67
-4
-4
x 10
2
...
5 V
VDS = VGS-VT
VGS = 2
...
0 V
3
2
V GS = 1
...
0 V
1
...
5 V
1
0
...
0 V
V GS = 1
...
5
1
1
...
5
0
0
...
5
2
2
...
25 µm)
Figure 3
...
25µm CMOS technology
...
5
tion between both regions is delineated by the VDS = VGS - VT curve
...
The linear
dependence of the saturation current with respect to VGS is apparent in the short-channel
device of b
...
This results in a substantial drop in current drive for high voltage levels
...
5 V, VDS = 2
...
-4
6
-4
x 10
2
...
5
3
1
2
quadratic
0
...
5
1
1
...
5
0
0
0
...
5
2
2
...
25 µm)
Figure 3
...
25 µm CMOS
technology)
...
5 for both transistors and VDS = 2
...
chapter3
...
20)
...
All the derived equations hold for the PMOS transistor as well
...
is illustrated in Figure 3
...
25 µm CMOS process
...
Interesting to observe is also that the effects of
velocity saturation are less pronounced than in the CMOS devices
...
-4
0
x 10
VGS = -1
...
2
VGS = -1
...
4
VGS = -2
...
6
-0
...
5
VGS = -2
...
5
-1
-0
...
21 I-V characteristics of (Wd=0
...
25 µm) PMOS transistor in 0
...
Due to the smaller mobility, the maximum
current is only 42% of what is achieved by a similar
NMOS transistor
...
20 reveals that the current does not
drop abruptly to 0 at VGS = VT
...
This effect is called
subthreshold or weak-inversion conduction
...
The transition from the on- to the off-condition is thus not abrupt, but gradual
...
20b on a logarithmic scale as shown in Figure 3
...
This confirms that the current
does not drop to zero immediately for VGS < VT, but actually decays in an exponential
2
fashion, similar to the operation of a bipolar transistor
...
The current in this region can be approximated by the expression
2
Discussion of the operation of bipolar transistors is out of the scope of this textbook
...
chapter3
...
3
The MOS(FET) Transistor
69
-2
10
-4
10
Linear region
Quadratic region
-6
ID (A)
10
-8
10
-10
10
-12
10
0
Subthreshold exponential region
VT
0
...
5
2
2
...
22 ID current versus VGS
(on logarithmic scale), showing the
exponential characteristic of the
subthreshold region
...
40)
where IS and n are empirical parameters, with n ≥ 1 and typically ranging around 1
...
In most digital applications, the presence of subthreshold current is undesirable as it
detracts from the ideal switch-like behavior that we like to assume for the MOS transistor
...
The (inverse) rate of decline of the current with respect toVGS below VT
hence is a quality measure of a device
...
From Eq
...
40), we find
S = n kT ln ( 10 )
-----
q
(3
...
For an ideal transistor with the sharpest possible rolloff, n = 1 and (kT/q)ln(10) evaluates to 60 mV/decade at room temperature, which means
that the subthreshold current drops by a factor of 10 for a reduction inVGS of 60 mV
...
5)
...
The value ofn is determined by the intrinsic
device topology and structure
...
Subthreshold current has some important repercussions
...
This is especially
important in the so-called dynamic circuits, which rely on the storage of charge on a
capacitor and whose operation can be severely degraded by subthreshold leakage
...
chapter3
...
6 Subthreshold Slope
For the example of Figure 3
...
5 mV/decade is observed (between 0
...
4
V)
...
49
...
Its behavior is heavily non-linear and is influenced by a large number of secondorder effects
...
While excellent from an accuracy perspective, these models fail in providing a designer
with an intuitive insight in the behavior of a circuit and its dominant design parameters
...
A
designer who misses a clear vision on what drives and governs the circuit operation by
necessity resorts on a lengthy trial by error optimization process, that most often leads to
an inferior solution
...
It turns out that the first-order expressions,
derived earlier in the chapter, can be combined into a single expression that meets these
goals
...
23), the value
of which is given defined in the Figure
...
(3
...
(3
...
(3
...
I D = 0 for V GT ≤ 0
2
V min
W
I D = k′ ---- V GT V min – ---------- ( 1 + λV DS ) for V GT ≥ 0
2
L
G
ID
S
D
B
with V min = min ( V GT, V DS V DSAT ),
,
V GT = V GS – V T ,
and V T = V T0 + γ (
Figure 3
...
Besides being a function of the voltages at the four terminals of the transistor, the
model employs a set of five parameters:VTO, γ, VDSAT, k’ and λ
...
The complexity of the device makes this a precarious task
...
More significantly, the model should match the best in
chapter3
...
3
The MOS(FET) Transistor
71
the regions that matter the most
...
V
The performance of an MOS digital circuit is primarily determined by the maximum
available current (i
...
, the current obtained for VGS = VDS = supply voltage)
...
Example 3
...
25 µm CMOS Process3
Based on the simulated ID-VDS and ID-VGS plots of a (Wd = 0
...
25 µm) transistor,
implemented in our generic 0
...
19, Figure 3
...
5 V, VGS = 2
...
5 V being the typical supply voltage for this process
...
24 for the NMOS transistor, and compared to the simulated values
...
5
x 10
2
D
I (A)
1
...
24 Correspondence between simple model
(solid line) and SPICE simulation (dotted) for minimumsize NMOS transistor (Wd=0
...
25 µm)
...
0
...
5
1
1
...
5
(V)
good correspondence can be observed with the exception of the transition region between
resistive and velocity-saturation
...
(3
...
It
demonstrates that our model, while simple, manages to give a fair indication of the overall
behavior of the device
...
2 tabulates the obtained parameter values for the minimum-sized NMOS and a similarly sized PMOS device in our generic 0
...
These values will be used as
generic model-parameters in later chapters
...
2
Parameters for manual model of generic 0
...
VT0 (V)
γ (V0
...
43
0
...
63
115 × 10−6
0
...
4
-0
...
1
3
A MATLAB implementation of the model is available on the web-site of the textbook
...
fm Page 72 Monday, September 6, 1999 1:50 PM
72
THE DEVICES
Chapter 3
A word of caution — The model presented here is derived from the characteristics
of a single device with a minimum channel-length and width
...
Fortunately, digital circuits typically use only minimum-length devices as
these lead to the smallest implementation area
...
It is however advisable to use a different set of model parameters for
devices with dramatically different size- and shape-factors
...
We therefore introduce an even
more simplified model that has the advantage of being linear and straightforward
...
R
VGS ≥ VT
Ron
S
D
Figure 3
...
The main problem with this model is that Ron is still time-variant, non-linear and
depending upon the operation point of the transistor
...
A reasonable
approach in that respect is to use the average value of the resistance over the operation
region of interest, or even simpler, the average value of the resistances at the end-points of
the transition
...
R eq = averaget = t
1 …t 2
1
( R on ( t ) ) = ------------t2 – t1
∫
t2
t2
V (t)
1
R on (t) dt = ------------- --------------- dt
- DS t2 – t 1 I D( t )
t1
∫
t1
(3
...
8 Equivalent resistance when (dis)charging a capacitor
One of the most common scenario’ in contemporary digital circuits is the discharging of a
s
capacitor from VDD to GND through an NMOS transistor with its gate voltage set toVDD, or
vice-versa the charging of the capacitor toVDD through a PMOS with its gate at GND
...
Assuming that the supply voltage is substantially larger than the velocity-saturation voltage DSAT of
V
the transistor, it is fair to state that the transistor stays in velocity saturation for the entire
chapter3
...
3
The MOS(FET) Transistor
73
duration of the transition
...
ID
VDS (VDD→VDD/2)
VDD
VGS = VDD
Rmid
ID
R0
VDS
VDD/2
(a) schematic
VDD
(b) trajectory traversed on ID-VDS curve
...
26 Discharging a capacitor through an NMOS transistor: Schematic (a) and I-V trajectory (b)
...
V
With the aid of Eq
...
42) and Eq
...
V DD ⁄ 2
R eq
1
= -------------------– V DD ⁄ 2
∫
V DD
V
3 V DD
7
---------------------------------- dV ≈ -- ------------ 1 – -- λV DD
I DSAT ( 1 + λV )
4 I DSAT
9
(3
...
44)
A number of conclusions are worth drawing from the above expressions:
•
The resistance is inversely proportional to the ( /L) ratio of the device
...
•
For VDD >> VT + VDSAT /2, the resistance becomes virtually independent of the supply voltage
...
Only a minor improvement in resistance, attributable to the
channel-length modulation, can be observed when raising the supply voltage
...
chapter3
...
5
1
1
...
5
(V)
Figure 3
...
25 µm CMOS process as a
function of VDD
(VGS = VDD, VDS = VDD →VDD/2)
...
3 enumerates the equivalent resistances obtained by simulation of our generic 0
...
These values will come in handy when analyzing the performance of CMOS
gates in later chapters
...
3 Equivalent resistance Req (W/L= 1) of NMOS and PMOS transistors in 0
...
For larger devices, divide Req by W/L
...
5
2
2
...
3
...
A profound understanding of the nature and the behavior of these intrinsic capacitances is
essential for the designer of high-quality digital integrated circuits
...
Aside from the MOS structure capacitances, all capacitors are nonlinear and vary with the applied voltage, which makes their
analysis hard
...
chapter3
...
3
The MOS(FET) Transistor
75
MOS Structure Capacitances
The gate of the MOS transistor is isolated from the conducting channel by the gate oxide
that has a capacitance per unit area equal toCox = ε ox / tox
...
The total value of this capacitance is called thegate capacitance Cg and can be
decomposed into two elements, each with a different behavior
...
Another part is
solely due to the topological structure of the transistor
...
Consider the transistor structure of Figure 3
...
Ideally, the source and drain diffusion should end right at the edge of the gate oxide
...
Hence, the
effective channel of the transistor L becomes shorter than the drawn length Ld (or the
length the transistor was originally designed for) by a factor of = 2xd
...
This capacitance is strictly linear and has a fixed value
Polysilicon gate
Drain
Source
W
n+
xd
n+
xd
Ld
Gate-bulk
overlap
(a) Top view
Gate oxide
tox
n+
L
(b) Cross section
n+
Figure 3
...
C GSO = C GDO = C ox x d W = Co W
(3
...
Channel Capacitance
Perhaps the most significant MOS parasitic circuit element, the gate-to-channel capacitance CGC varies in both magnitude and in its division into three components GCS, CGCD,
C
and CGCB (being the gate-to-source, gate-to-drain, and gate-to-body capacitances, respectively), depending upon the operation region and terminal voltages
...
29
...
fm Page 76 Monday, September 6, 1999 1:50 PM
76
THE DEVICES
Chapter 3
cut-off (a), no channel exists, and the total capacitanceCGC appears between gate and
body
...
Consequently,CGCB = 0 as the body electrode is shielded from
the gate by the channel
...
Finally, in the saturation mode (c), the channel is pinched off
...
All the capacitance hence is between gate and source
...
29 The gate-to-channel capacitance and how the operation region influences is distribution over the three other
device terminals
...
The first plot (Figure 3
...
For
VGS = 0, the transistor is off, no channel is present and the total capacitance, equal to
WLCox, appears between gate and body
...
This seemingly causes the thickness of the gate dielectric to increase,
which means a reduction in capacitance
...
With VDS = 0, the device operates in the resistive mode and
the capacitance divides equally between source and drain, or GCS = CGCD = WLCox/2
...
A
designer looking for a well-behaved linear capacitance should avoid operation in this
region
...
30
3
WLCox
CGCD
2
VGS
VT
(a) CGC as a function of VGS (with VDS=0)
2WLCox
CGCS
0
VDS/(VGS-VT)
1
(b) CGC as a function of the degree of saturation
Distribution of the gate-channel capacitance as a function of GS and VDS (from [Dally98])
...
As illustrated in Figure 3
...
fm Page 77 Monday, September 6, 1999 1:50 PM
Section 3
...
This also means that the total gate capacitance is getting smaller with an increased
level of saturation
...
To make a first-order analysis possible, we
will use a simplified model with a constant capacitance value in each region of operation
in the remainder of the text
...
4
...
4
Average distribution of channel capacitance of MOS transistor for different operation regions
...
9 Using a circuit simulator to extract capacitance
Determining the value of the parasitic capacitances of an MOS transistor for a given operation
mode is a labor-intensive task, and requires the knowledge of a number of technology parameters that are often not explicitly available
...
Assume we would
like to know the value of the total gate capacitance of a transistor in a given technology as a
function of VGS (for VDS = 0)
...
31a will give us exactly
this information
...
dV
C G(V GS ) = I ⁄ GS
dt
A transient simulation gives us VGS as a function of time, which can be translated into
the capacitance with the aid of some simple mathematical manipulations
...
31b, which plots the simulated gate capacitance of a minimum size 0
...
The graphs clearly shows the drop of the capacitance when VGS approaches VT and the discontinuity atVT, predicted in Figure 3
...
Junction Capacitances
A final capacitive component is contributed by the reverse-biased source-body and drainbody pn-junctions
...
To understand the components of the junction
capacitance (often called the diffusion capacitance), we must look at the source (drain)
chapter3
...
31 Simulating the gate capacitance of an MOS
transistor; (a) circuit configuration used for the analysis, (b)
resulting capacitance plot for minimum-size NMOS transistor
in 0
...
7
6
5
4
3
2
-2
-1
...
5
0
V
GS
0
...
5
2
(V)
(b)
region and its surroundings
...
32, shows that the
junction consists of two components:
Channel-stop implant
NA+
Side wall
Source
ND
W
Bottom
xj
Side wall
Channel
LS
Substrate NA
Figure 3
...
• The bottom-plate junction, which is formed by the source region (with dopingND)
and the substrate with doping NA
...
(3
...
As the bottom-plate junction is typically of the abrupt type, the
grading coefficient m approaches 0
...
• The side-wall junction, formed by the source region with doping D and the p+ chanN
nel-stop implant with doping level NA+
...
The
side-wall junction is typically graded, and its grading coefficient varies from 0
...
5
...
Notice that no side-wall
capacitance is counted for the fourth side of the source region, as this represents the
conductive channel
...
An expression for the total
jsw
junction capacitance can then be derived,
chapter3
...
3
The MOS(FET) Transistor
79
C diff = C bottom + C sw = C j × AREA + C jsw × PERIMETER
= C j L S W + C jsw ( 2L S + W )
(3
...
(3
...
Problem 3
...
31)
...
33
...
G
CGS
CGD
D
S
CGB
CSB
CDB
Figure 3
...
B
CGS = CGCS + CGSO; CGD = CGCD + CGDO; CGB = CGCB
CSB = CSdiff; CDB = CDdiff
(3
...
Example 3
...
24 µm, W =
0
...
625 µm, CO = 3 × 10–10 F/m, Cj0 = 2 × 10–3 F/m2, Cjsw0 = 2
...
Determine the zero-bias value of all relevant capacitances
...
7 fF/µm 2
...
49 fF
...
105 fF
...
7 fF
...
Due to the doping conditions and the small area, this component can virtually always be ignored in
a first-order analysis
...
JSWG
chapter3
...
The
former is equal to Cj0 LDW = 0
...
44 fF
...
89 fF
...
This is a worst-case
condition, however
...
Also, clever design can help to reduce the value ofLD (LS)
...
Design Data — MOS Transistor Capacitances
Table 3
...
25 µm CMOS process
...
5 Capacitance parameters of NMOS and PMOS transistors in 0
...
Cox
(fF/µm2)
CO
(fF/µm)
Cj
(fF/µm2)
mj
φb
(V)
Cjsw
(fF/µm)
mjsw
φbsw
(V)
NMOS
6
0
...
5
0
...
28
0
...
9
PMOS
6
0
...
9
0
...
9
0
...
32
0
...
34a
...
The resistance of the drain (source) region can be expressed as
L S, D
R S, D = ---------- R
W
+ RC
(3
...
34b)
...
Observe that the resistance of a square
of material is constant, independent of its size (see also Chapter 4)
...
Keeping its value as small as possible is thus
an important design goal for both the device and the circuit engineer
...
This process is calledsilicidation and effec-
chapter3
...
3
The MOS(FET) Transistor
81
tively reduces the sheet resistance to values in the range from 1 to 4
Ω/o
...
(3
...
With a process that includes silicidation and proper attention to layout, parasitic resistance
is not important
...
3
...
4
The Actual MOS Transistor— Some Secondary Effects
The operation of a contemporary transistor may show some important deviations from the
model we have presented so far
...
At that point, the
assumption that the operation of a transistor is adequately described by a one-dimensional
model, where it is assumed that all current flows on the surface of the silicon and the electrical fields are oriented along that plane, is not longer valid
...
An example of such was already given inSection
3
...
2 when we discussed the mobility degradation
...
One word of warning, though
...
It
is therefore advisable to analyze and design MOS circuits first using the ideal model
...
Polysilicon gate
LD
G
Drain
contact
D
S
RS
W
VGS,eff
RD
Drain
(a) Modeling the series resistance
Figure 3
...
(b) Parameters of the series resistance
Series drain and source resistance
...
fm Page 82 Monday, September 6, 1999 1:50 PM
82
THE DEVICES
Chapter 3
VT
VT
Long-channel threshold
L
(a) Threshold as a function of the
length (for low VDS)
Figure 3
...
Threshold Variations
Eq
...
19) states that the threshold voltage is only a function of the manufacturing technology and the applied body biasVSB
...
As the device dimensions are reduced, this
model becomes inaccurate, and the threshold potential becomes a function of W, and
L,
VDS
...
In the traditional derivation of the VTO, for instance, it is assumed that the channel
depletion region is solely due to the applied gate voltage and that all depletion charge
beneath the gate originates from the MOS field effects
...
Since a part of the region below the gate is already
depleted (by the source and drain fields), a smaller threshold voltage suffices to cause
strong inversion
...
35a)
...
Consequently, the threshold
decreases with increasing VDS
...
35b)
...
The sharp increase in
current that results from this effect, which is calledpunch-through, may cause permanent
damage to the device and should be avoided
...
Since the majority of the transistors in a digital circuit are designed at the minimum
channel length, the variation of the threshold voltage as a function of the length is almost
uniform over the complete design, and is therefore not much of an issue except for the
increased sub-threshold leakage currents
...
This is, for instance, a problem in dynamic memories,
where the leakage current of a cell (being the subthreshold current of the access transistor)
becomes a function of the voltage on the data-line, which depends upon the applied data
patterns
...
chapter3
...
3
The MOS(FET) Transistor
83
Worth mentioning is that the threshold of the MOS transistor is also subject to
narrow-channel effects
...
The gate
voltage must support this extra depletion charge to establish a conducting channel
...
For small geometry transistors,
with small values of L and W, the effects of short- and narrow channels may tend to cancel
each other out
...
This is the result of the hot-carrier effect [Hu92]
...
The resulting increase in the electrical field
strength causes an increasing velocity of the electrons, which can leave the silicon and
tunnel into the gate oxide upon reaching a high-enough energy level
...
For an electron to become hot, an
4
electrical field of at least 10 V/cm is necessary
...
The hot-electron phenomenon can lead to a
long-term reliability problem, where a circuit might degrade or fail after being in use for a
while
...
36, which shows the degradation in the characterI-V
istics of an NMOS transistor after it has been subjected to extensive operation
...
The reduced supply voltage that is typical for deep sub-micron technologies can in part be attributed to the necessity to keep hotcarrier effects under control
...
36 Hot-carrier effects cause the I-V characteristics of an NMOS transistor to degrade from
extensive usage (from [McGaughy98])
...
fm Page 84 Monday, September 6, 1999 1:50 PM
84
THE DEVICES
Chapter 3
CMOS Latchup
The MOS technology contains a number of intrinsic bipolar transistors
...
Triggering these thyristor-like
devices leads to a shorting of the VDD and VSS lines, usually resulting in a destruction of
the chip, or at best a system failure that can only be resolved by power-down
...
37a
...
A circuit
equivalent is shown in Figure 3
...
When one of the two bipolar transistors gets forward
biased (e
...
, due to current flowing through the well, or substrate), it feeds the base of the
other transistor
...
VDD
VDD
Rnwell
p+
n+
p+
n+
n-well
p+
Rnwell
Rpsubs
n-source
p-substrate
(a) Origin of latchup
Figure 3
...
From the above analysis the message to the designer is clear— to avoid latchup, the
resistances Rnwell and Rpsubs should be minimized
...
Devices carrying a lot of current (such as transistors in the I/O
drivers) should be surrounded by guard rings
...
For an extensive discussion on how to avoid latchup, please refer to
[Weste93]
...
In recent
years, process innovations and improved design techniques have all but eliminated the
risks for latchup
...
3
...
In general, more accuracy also means more complexity and,
hence, an increased run time
...
fm Page 85 Monday, September 6, 1999 1:50 PM
Section 3
...
SPICE Models
SPICE has three built-in MOSFET models, selected by the LEVEL parameter in the
model card
...
They should only be used for first-order analysis, and we
therefore limit ourselves to a short discussion of their main properties
...
It
does not handle short-channel effects
...
It handles effects such as velocity saturation, mobility degradation, and drain-induced barrier lowering
...
• LEVEL 3 is a semi-empirical model
...
It
works quite well for channel lengths down to 1µm
...
A
complete description of all those would take the remainder of this book, which is, obviously, not the goal
...
g
...
The BSIM3V3 SPICE Model
The confusing situation of having to use a different model for each manufacturer has fortunately been partially resolved by the adoption of the BSIM3v3 model as an industrywide standard for the modeling of deep-submicron MOSFET transistors
...
Its popularity and accuracy make it the natural choice for all the simulations presented in this book
...
Fortunately, understanding the intricacies of all these parameters is not a requirement for the
digital designer
...
6)
...
Providing a single set of parameters
that is acceptable over all possible device dimensions is deemed to be next to impossible
...
fm Page 86 Monday, September 6, 1999 1:50 PM
86
THE DEVICES
Chapter 3
LMIN, LMAX, WMIN, and WMAX (called a bin)
...
Table 3
...
Parameter Category
Description
Control
Selection of level and models for mobility, capacitance, and noise
LEVEL, MOBMOD, CAPMOD
DC
Parameters for threshold and current calculations
VTH0, K1, U0, VSAT, RSH,
AC & Capacitance
Parameters for capacitance computations
CGS(D)O, CJ, MJ, CJSW, MJSW
dW and dL
Derivation of effective channel length and width
Process
Process parameters such as oxide thickness and doping concentrations
TOX, XJ, GAMMA1, NCH, NSUB
Temperature
Nominal temperature and temperature coefficients for various device parameters
TNOM
Bin
Bounds on device dimensions for which model is valid
LMIN, LMAX, WMIN, WMAX
Flicker Noise
Noise model parameters
We refer the interested reader to the BSIM3v3 documentation provided on the website of the textbook (REFERENCE) for a complete description of the model parameters
and equations
...
25 µm CMOS process can be
found at the same location
...
7
...
SPICE assumes
default values (which are often zero!) for the missing factors
...
For instance, you must
accurately specify the area and the perimeter of the source and drain regions of the devices
when performing a performance analysis
...
Similarly, it is often necessary to painstakingly define the value of the drain and
source resistance
...
chapter3
...
4
A Word on Process Variations
Table 3
...
Parameter Name
Symbol
SPICE Name
Units
Default Value
Drawn Length
L
L
m
−
Effective Width
W
W
m
−
2
Source Area
AREA
AS
m
Drain Area
AREA
AD
m2
0
Source Perimeter
PERIM
PS
m
0
Drain Perimeter
PERIM
PD
m
0
Squares of Source Diffusion
NRS
−
1
Squares of Drain Diffusion
NRD
−
1
Example 3
...
Transistor M1 is an NMOS device of model-type (and bin)
nmos
...
Its gate length is the minimum allowed in this technology (0
...
The ‘ ’
+
character at the start of line 2 indicates that this line is a continuation of the previous one
...
1, connected between nodes nvout, nvin, nvdd, and
nvdd (D, G, S, and B, respectively), is three times wider, which reduces the series resistance,
but increases the parasitic diffusion capacitances as the area and perimeter of the drain and
source regions go up
...
lib line refers to the file that contains the transistor models
...
1 W=0
...
25U
+AD=0
...
625U AS=0
...
625U NRS=1 NRD=1
M2 nvout nvin nvdd nvdd pmos
...
125U L=0
...
7P PD=2
...
7P PS=2
...
33 NRD=0
...
lib 'c:\Design\Models\cmos025
...
4 A Word on Process Variations
The preceding discussions have assumed that a device is adequately modeled by a single
set of parameters
...
This observed random distribution between supposedly identical devices is primarily the result of two factors:
1
...
These result in diverging values for
sheet resistances, and transistor parameters such as the threshold voltage
...
fm Page 88 Monday, September 6, 1999 1:50 PM
88
THE DEVICES
Chapter 3
2
...
This causes deviations in the W/L) ratios of
(
MOS transistors and the widths of interconnect wires
...
For instance,
variations in the length of an MOS transistor are unrelated to variations in the threshold
voltage as both are set by different process steps
...
• The threshold voltage VT can vary for numerous reasons: changes in oxide thickness, substrate, poly and implant impurity levels, and the surface charge
...
Where in the
past thresholds could vary by as much as 50%, state-of-the-art digital processes
manage to control the thresholds to within 25-50 mV
...
Variations can also occur in the mobility but to a lesser degree
...
These are mainly caused by the lithographic process
...
The measurable impact of the process variations may be a substantial deviation of
the circuit behavior from the nominal or expected response, and this could be in either
positive or negative directions
...
Assume, for instance, that you are supposed to design a microprocessor running
at a clock frequency of 500 MHz
...
One way to achieve that goal is to
design the circuit assuming worst-case values for all possible device parameters
...
To help the designer make a decision on how much margin to provide, the device
manufacturer commonly provides fast and slow device models in addition to the nominal
ones
...
Example 3
...
25µm CMOS process
...
Assume initially that VGS = VDS = 2
...
From earlier simulations, we know that this
produces a drain current of 220 µA
...
Simulations produce the following data:
Fast: Id = 265 µA: +20%
Slow: Id = 182 µA: -17%
Let us now proceed one step further
...
For instance, the voltage delivered by a battery can drop off substantially
chapter3
...
4
A Word on Process Variations
89
towards the end of its lifetime
...
Fast + Vdd = 2
...
25 V: Id = 155 µA: -30%
The current levels and the associated circuit performance can thus vary by almost 100%
between the extreme cases
...
This translates into a
severe area penalty
...
The probability that all parameters assume their worst-case values simultaneously is very
low, and most designs will display a performance centered around the nominal design
...
g
...
Specialized design tools to help meet this goal are available
...
The result is a distribution plot of design parameters
(such as the speed or the sensitivity to noise) that can help to determine if the nominal
design is economically viable
...
38
...
2
...
10
Delay (ns)
Delay (ns)
1
...
70
1
...
10
1
...
70
1
...
30
1
...
50
Leff (in mm)
1
...
50
–0
...
80
–0
...
60
–0
...
38 Distribution plots of speed of adder circuit as a function of varying device parameters, as obtained
by a Monte Carlo analysis
...
One important conclusion from the above discussion is that SPICE simulations
should be treated with care
...
Actual implementations are bound
to differ from the simulation results, and for reasons other than imperfections in the mod-
chapter3
...
Be furthermore aware that temperature variations on the die can present
another source for parameter deviations
...
3
...
As already argued in the introduction, applications that were considered implausible yesterday are already forgotten today
...
To illustrate this point, we have plotted in
Figure 3
...
We observe a reduction rate of approximately 13% per
year, halving every 5 years
...
2
Minimum Feature Size (micron)
10
1
10
0
10
Figure 3
...
Dots represent
observed or projected (2000 and beyond) values
...
-1
10
-2
10
1960
1970
1980
1990
2000
2010
Year
A pertinent question is how this continued reduction in feature size influences the
operating characteristics and properties of the MOS transistor, and indirectly the critical
digital design metrics such as switching frequency and power dissipation
...
In
addition to the minimum device dimension, we have to consider the supply voltage as a
second independent variable in such a study
...
Three different models are studied in Table 3
...
To make the results tractable, it is
assumed that all device dimensions scale by the same factorS (with S > 1 for a reduction
in size)
...
Similarly, we assume that all voltages, including the supply voltage and
chapter3
...
5
Perspective: Technology Scaling
91
the threshold voltages, scale by a same ratioU
...
Observe that this analysis only
considers short-channel devices with a linear dependence between control voltage and saturation current (as expressed by Eq
...
39))
...
Table 3
...
Parameter
Relation
Full Scaling
General Scaling
Fixed-Voltage Scaling
W, L, tox
1/S
1/S
1/S
VDD, VT
1/S
1/U
1
NSUB
Area/Device
V/Wdepl
2
WL
2
S
1/S
S /U
S2
2
1/S
2
1/S2
Cox
1/tox
S
S
S
Cgate
CoxWL
1/S
1/S
1/S
kn, kp
CoxW/L
S
S
S
Isat
CoxWV
1/S
1/U
1
Current Density
Isat/Area
S
S2/U
S2
Ron
V/Isat
1
1
1
Intrinsic Delay
RonCgate
1/S
1/S
1/S
P
IsatV
1/S2
1/U2
Power Density
P/Area
1
2
S /U
2
1
S2
Full Scaling (Constant Electrical Field Scaling)
In this ideal model, voltages and dimensions are scaled by the same factor The goal is to
S
...
Keeping the electrical fields constant ensures the physical integrity of the device
and avoids breakdown or other secondary effects
...
The effects of full scaling on the device and circuit parameters are summarized in the third
column of Table 3
...
We use the intrinsic time constant, which is the product of the gate
capacitance and the on-resistance, as a measure for the performance
...
The performance improved is solely due to the reduced capacitance
...
It is assumed that the carrier mobilities are not affected by the scaling
...
The substrate doping Nsub is scaled so that the maximum depletion-layer width is reduced by a
factor S
...
It is furthermore assumed that the delay of the device is mainly determined by the intrinsic
capacitance (the gate capacitance) and that other device capacitances, such as the diffusion
capacitances, scale appropriately
...
chapter3
...
First of all, to keep new devices compatible
with existing components, voltages cannot be scaled arbitrarily
...
As a result, voltages
have not been scaled down along with feature sizes, and designers adhere to well-defined
standards for supply voltages and signal levels
...
40, 5 V was
the de facto standard for all digital components up to the early 1990s, and a
fixed-voltage
scaling model was followed
...
5µm CMOS technology did new standards such
as 3
...
5 V make an inroad
...
The reason for this change in operation model can partially be
5
4
...
5
3
2
...
5
1
0
...
40 Evolution of min and max supplyvoltage in digital integrated circuits as a function
of feature size
...
15 micron and
below are projected
...
8
...
The gain of an increased current is simply offset by the higher voltage level, and only hurts the power dissipation
...
(3
...
Keeping the voltage constant under these circumstances gives a distinct performance
advantage, as it causes a net reduction in on-resistance
...
Problem 3
...
chapter3
...
5
Perspective: Technology Scaling
93
Reconstruct Table 3
...
(3
...
WARNING: The picture painted in the previous section represents a first-order model
...
This is apparent in Figure 3
...
3, which show a reduction
of the equivalent on-resistance with increasing supply voltage — even for the high voltage
range
...
The reader should keep this warning in the back of his mind throughout this scaling
study
...
This implies ignoring second-order effects
such as mobility-degradation, series resistance, etc
...
40 that the supply voltages, while moving downwards, are not
scaling as fast as the technology
...
5 to
µm
0
...
5 V
...
• The scaling potential of the transistor threshold voltage is limited
...
This is aggravated by the large process variation of the value of the threshold, even on the same
wafer
...
This general scaling model is shown in the fourth column of
Table 3
...
Here, device dimensions are scaled by a factorS, while voltages are reduced by
a factor U
...
Note that the general-scaling model offers a performance scenario
identical to the full- and the fixed scaling, while its power dissipation lies between the two
models (for S > U > 1)
...
9 the characteristics of some of the most recent CMOS processes and projections on some future ones
...
As predicted by the
scaling model, the maximum drive current remains approximately constant
...
fm Page 94 Monday, September 6, 1999 1:50 PM
94
THE DEVICES
Table 3
...
Year of Introduction
1997
1999
2001
2003
2006
2009
Channel length (µm)
0
...
18
0
...
13
0
...
07
Gate oxide (nm)
4-5
3-4
2-3
2-3
1
...
5
VDD (V)
1
...
5
1
...
8
1
...
5
1
...
5
0
...
2
0
...
9
VT (V)
0
...
4
0
...
4
0
...
4
NMOS/PMOS IDsat (nA/µm)
600/280
600/280
600/280
600/280
600/280
600/280
From the above, it is reasonable to conclude that both integration density and performance will continue to increase
...
These transistors,
while working along similar concepts as the current MOS devices, look very different
from the structures we are familiar with, and require some substantial
device engineering
...
41 shows a potential transistor structure, the folded channel dualgated transistor, which has proven to be operational up to very small channel lengths
...
41
Folded dual-gated transistor with 25 nm channel length [Hu99]
...
Whether this will actually happen is an
s
open question
...
A first doubt is if such a part can be
manufactured in an economical way
...
Design considerations also play a role
...
The
growing role of interconnect parasitics might put an upper bound on performance
...
All in
all, it is obvious that the design of semiconductor circuits still faces an exciting future
...
fm Page 95 Monday, September 6, 1999 1:50 PM
Section 3
...
6 Summary
In this chapter, we have presented a a comprehensive overview of the operation of the
MOSFET transistor, the semiconductor device at the core of virtually all contemporary
digital integrated circuits
...
These models
will be used extensively in later chapters, where we look at the fundamental building
blocks of digital circuits
...
• The static behavior of the junction diode is well described by the ideal diode equation that states that the current is anexponential function of the applied voltage bias
...
This is particularly important as
the omnipresent source-body and drain-body junctions of the MOS transistors all
operate in this mode
...
• The MOS(FET) transistor is a voltage-controlled device, where the controlling gate
terminal is insulated from the conducting channel by a SiO capacitor
...
One of the most
enticing properties of the MOS transistor, which makes it particularly amenable to
digital design, is that it approximates a voltage-controlled switch: when the control
voltage is low, the switch is nonconducting (open); for a high control voltage, a conducting channel is formed, and the switch can be considered closed
...
• The continuing reduction of the device dimensions to the submicron range has introduced some substantial deviations from the traditional long-channel MOS transistor
model
...
Models for this effect as well as other second-order parasitics
have been introduced
...
• The dynamic operation of the MOS transistor is dominated by the
device capacitors
...
The minimization of these
capacitances is the prime requirement in high-performance MOS design
...
It was
observed that these models represent an average behavior and can vary over a single
wafer or die
...
fm Page 96 Monday, September 6, 1999 1:50 PM
96
THE DEVICES
Chapter 3
• The MOS transistor is expected to dominate the digital integrated circuit scene for
the next decade
...
07
micron by the year 2010, and logic circuits integrating more than 1 billion transistors on a die
...
7 To Probe Further
Semiconductor devices have been discussed in numerous books, reprint volumes, tutorials, and journal articles
...
The
books and journals referenced below contain excellent discussions of the semiconductor
devices of interest or refer to specific topics brought up in the course of this chapter
...
Antognetti and G
...
),Semiconductor Device Modeling with SPICE,
McGraw-Hill, 1988
...
Bhanzhaf, Computer Aided Analysis Using PSPICE, 2nd ed
...
[Chen90] J
...
[Gray69] P
...
Searle, Electronic Principles, John Wiley and Sons, 1969
...
Gray and R
...
, John
Wiley and Sons, 1993
...
Haznedar, Digital Microelectronics, Benjamin/Cummings, 1991
...
Hodges and H
...
,
McGraw-Hill, 1988
...
Howe and S
...
[Hu92] C
...
27, no
...
241–246, March 1992
...
Hu, “Future CMOS Scaling and Reliability,” IEEE Proceedings, vol
...
5, May
1993
...
Jensen et al
...
1–61, August 1991
...
Ko, “Approaches to Scaling,” in VLSI Electronics: Microstructure Science, vol
...
1–37, Academic Press, 1989
...
Muller and T
...
, John
Wiley and Sons, 1986
...
Nagel, “SPICE2: a Computer Program to Simulate Semiconductor Circuits,” Memo
ERL-M520, Dept
...
and Computer Science, University of California at Berkeley, 1975
...
Sedra and K
...
, Holt, Rinehart and Winston,
1987
...
Sheu, D
...
Ko, and M
...
SC-22, no
...
558–565, August 1987
...
fm Page 97 Monday, September 6, 1999 1:50 PM
Section 3
...
Sze, Physics of Semiconductor Devices, 2nd ed
...
[Thorpe92] T
...
[Toh88] K
...
Koh, and R
...
23
...
4, pp 950–957, August 1988
...
Tsividis, Operation and Modeling of the MOS Transistor, McGraw-Hill, 1987
...
Yamaguchi et al
...
Electron
...
35, no 8, pp
...
[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
8 Exercises and Design Problems
For all problems, use the device parameters provided in Chapter 3 (XXX) and the inside back book
cover, unless otherwise mentioned
...
1
...
23]
a
...
42
...
7 V, solve for
Don
ID
...
Find ID and VD using the ideal diode equation
...
c
...
d
...
R1 = 2 kW
+
5V
–
R2 = 2 kW
ID
+
VD
–
Figure 3
...
3
...
[M, None, 2
...
3] For the circuit in Figure 3
...
3 V
...
65 V,
and m = 0
...
NA = 2
...
a
...
b
...
Find the depletion region width,Wj, of the diode
...
Use the parallel-plate model to find the junction capacitance,Cj
...
Set Vs = 1
...
Again using the parallel-plate model, explain qualitatively whyCj
increases
...
3
...
44 shows NMOS and PMOS devices with drains, source, and gate
ports annotated
...
Verify with SPICE
...
7 V, λ = 0
...
8 V, λ = 0
...
Assume (W/L) = 1
...
NMOS: VGS = 3
...
3 V
...
5 V, VDS = –1
...
2
CONTENTS
1
...
6 Exercises
Chapter 2: The Manufacturing Process (30 pages)
2
...
2
2
...
4
2
...
6
2
...
8
Introduction
The CMOS Manufacturing Process
Design Rules — The Contract between Designer and Process Engineer
Packaging Integrated Circuits
Perspective — Trends in Process Technology
Summary
To Probe Further
Exercises and Design Problems
Design Methodology Insert A: Design Layout and Design Rule Verification (6 pages)
Chapter 3: The Devices (52 pages)
3
...
2 The Diode
3
...
1
A First Glance at the Diode — The Depletion Region
3
...
2
Static Behavior
3
...
3
Dynamic, or Transient, Behavior
3
...
4
The Actual Diode—Secondary Effects
3
...
5
The SPICE Diode Model
3
...
3
...
3
...
3
...
3
...
3
...
4 A Word on Process Variations
3
...
6 Summary
3
...
8 Exercises and Design Problems
Design Methodology Insert B: Device Models and Circuit Simulation (6 pages)
Chapter 4: The Wire (40 pages)
4
...
2 A First Glance
4
...
3
...
3
...
3
...
4 Electrical Wire Models
CONTENTS
3
4
...
6
4
...
8
4
...
4
...
4
...
4
...
4
...
4
...
5
...
5
...
1 Introduction
5
...
3 Evaluating the Robustness of the CMOS Inverter: The Static Behavior
5
...
1
Switching Threshold
5
...
2
Noise Margins
5
...
3
Robustness Revisited
5
...
4
...
4
...
4
...
5 Power, Energy, and Energy-Delay
5
...
1
Dynamic Power Consumption
5
...
2
Static Consumption
5
...
3
Putting It All Together
5
...
4
Analyzing Power Consumption Using SPICE
5
...
7 Summary
5
...
9 Exercises and Design Problems
Chapter 6: Designing Combinational Logic Gates in CMOS (64 pages)
6
...
2 Static CMOS Design
6
...
1
Complementary CMOS
6
...
2
Ratioed Logic
6
...
3
Pass-Transistor Logic
6
...
3
...
3
...
4
6
...
6
6
...
3
...
3
...
1
7
...
3
7
...
5
7
...
7
7
...
9
7
...
11
7
...
13
Introduction
Timing Metrics for Sequential Circuits
Classification of Memory Elements
Static Latches and Registers
7
...
1
The Bistability Principle
7
...
2
SR Flip-Flops
7
...
3
Multiplexer-Based Latches
7
...
4
Master-Slave Based Edge Triggered Register
7
...
5
Non-ideal clock signals
7
...
6
Low-Voltage Static Latches
Dynamic Latches and Registers
7
...
1
Dynamic Transmission-Gate Based Edge-triggred Registers
7
...
2
C2MOS Dynamic Register: A Clock Skew Insensitive Approach
7
...
3
True Single-Phase Clocked Register (TSPCR)
Pulse Registers
Sense-Amplifier Based Registers (Consolidate with 7
...
8
...
Register-Based Pipelines
7
...
2
NORA-CMOS—A Logic Style for Pipelined Structures
Non-Bistable Sequential Circuits
7
...
1
The Schmitt Trigger
7
...
2
Monostable Sequential Circuits
7
...
3
Oscillators
Perspective: Choosing a Clocking Strategy
Summary
To Probe Further
Exercises and Design Problems
Design Methodology Insert D: Timing Analysis and Verification (8-10 pages)
Chapter 8: Dealing with Interconnect (45 pages)
8
...
2 Capacitive Parasitics
8
...
1
Capacitance and Reliability—Cross Talk
8
...
2
Capacitance and Performance in CMOS
CONTENTS
5
8
...
3
...
3
...
3
...
4 Inductive Parasitics
8
...
1
Inductance and Reliability— Voltage Drop
8
...
2
Inductance and Performance—Transmission Line Effects
8
...
6 Chapter Summary
8
...
8 Exercises and Design Problems
Design Methodology Insert E: Interconnect modeling and analysis (6 pages)
PART III: A SYSTEM PERSPECTIVE
Chapter 9:
9
...
2
9
...
4
9
...
6
9
...
8
Designing Complex Digital Integrated Circuits (40 pages)
Introduction
The Standard-cell Design Approach
Array-based Design
Configurable and Reconfigurable Design
Perspective: Facing the Increasing Design Complexity
Summary
To Probe Further
Exercises and Design Problems
Chapter 10: Timing Issues in Digital Circuits (55 pages)
10
...
2 Synchronous systems
10
...
Impact of clock variation on performance
10
...
Clock Distribution Basics
10
...
Performance and Power Optimization in Synchronous Design
10
...
Asynchronous Design
10
...
The Asynchronous-synchronous Interface
10
...
Clock Signal Generation
10
...
10 Summary
10
...
12 Exercises and Design Problems
Chapter 11: Designing Arithmetic Building Blocks (50 pages)
9
...
2 Datapaths in Digital Processor Architectures
9
...
3
...
4
9
...
6
9
...
8
9
...
10
9
...
3
...
3
...
4
...
4
...
4
...
5
...
5
...
1 Introduction
10
...
2
...
2
...
3 The Memory Core
10
...
1 Read-Only Memories
10
...
2 Nonvolatile Read-Write Memories
10
...
3 Read-Write Memories (RAM)
10
...
4
...
4
...
4
...
4
...
5 Memory Reliability and Yield
10
...
1 Signal-To-Noise Ratio
10
...
2 Memory yield
10
...
6
...
6
...
7 Perspective: Semiconductor Memory Trends and Evolutions
10
...
9 To Probe Further
10
...
It addresses the following topics:
- pad design, ESD, guard rings, latchup
CONTENTS
7
- off-chip signaling: termination, current versus voltage mode,
high-speed serial links,
Design Methodology Insert G: Validation and Test of Manufactured Circuits (8 pages)
G
...
2 Design for Testability
G
...
THE FOUNDATIONS
CHAPTER 1: INTRODUCTION
1
...
2
Issues in Digital Integrated Circuit Design
1
...
3
...
3
...
3
...
3
...
4
Summary
1
...
1
Introduction
2
...
2
...
2
...
2
...
2
...
3
Design Rules — The Contract between Designer and Process Engineer
2
...
4
...
4
...
4
...
5
Perspective — Trends in Process Technology
2
...
1
2
...
2
Short-Term Developments
In the Longer Term
2
...
7
To Probe Further
DESIGN METHODOLOGY INSERT A: IC LAYOUT
CHPATER 3: THE DEVICES
3
...
2
The Diode
3
...
1
3
...
2
3
...
3
3
...
4
3
...
5
3
...
3
...
3
...
3
...
3
...
3
...
4
A Word on Process Variations
3
...
6
Summary
3
...
1
Introduction
4
...
3
Interconnect Parameters — Capacitance, Resistance, and Inductance
2
3
4
...
1
4
...
2
4
...
3
4
...
4
...
4
...
4
...
4
...
4
...
5
Table Of Contents
The Ideal Wire
The Lumped Model
The Lumped RC model
The Distributed rc Line
The Transmission Line
SPICE Wire Models
4
...
1
4
...
2
Distributed rc Lines in SPICE
Transmission Line Models in SPICE
4
...
7
Summary
4
...
A CIRCUIT PERSPECTIVE
Chapter 5: THE CMOS INVERTER
5
...
2
The Static CMOS Inverter — An Intuitive Perspective
5
...
3
...
3
...
3
...
4
Performance of CMOS Inverter: The Dynamic Behavior
5
...
1
5
...
2
5
...
3
5
...
5
...
5
...
5
...
5
...
6
Perspective: Technology Scaling and its Impact on the Inverter Metrics
5
...
8
4
To Probe Further
CHAPTER 6: DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
6
...
2
Static CMOS Design
6
...
1
6
...
2
6
...
3
6
...
3
...
3
...
3
...
3
...
4
Complementary CMOS
Ratioed Logic
Pass-Transistor Logic
Dynamic Logic: Basic Principles
Speed and Power Dissipation of Dynamic Logic
Issues in Dynamic Design
Cascading Dynamic Gates
Perspectives
6
...
1
6
...
2
How to Choose a Logic Style?
Designing Logic for Reduced Supply Voltages
6
...
6
To Probe Further
DESIGN METHODOLOGY INSERT C: HOW TO SIMULATE COMPLEX
LOGIC GATES
C
...
2
Representing Data as a Discrete Entity
C
...
4
To Probe Further
DESIGN METHODOLOGY INSERT D: LAYOUT TECHNIQUES FOR
COMPLEX GATES
5
Table Of Contents
CHAPTER 7: DESIGNING SEQUENTIAL LOGIC CIRCUITS
7
...
1
...
1
...
2
Static Latches and Registers
7
...
1
7
...
2
7
...
3
7
...
4
7
...
5
7
...
5
...
5
...
6
Dynamic Transmission-Gate Edge-triggered Registers
C2MOS—A Clock-Skew Insensitive Approach
True Single-Phase Clocked Register (TSPCR)
Alternative Register Styles*
7
...
1
7
...
2
7
...
3
...
3
...
3
...
4
Timing Metrics for Sequential Circuits
Classification of Memory Elements
Latch- vs
...
6
...
6
...
6
...
7
Perspective: Choosing a Clocking Strategy
7
...
9
To Probe Further
DIGITAL INTEGRATED CIRCUITS
6
PART III
...
1
Introduction
8
...
3
Custom Circuit Design
8
...
4
...
4
...
4
...
4
...
5
Standard Cell
Compiled Cells
Macrocells, Megacells and Intellectual Property
Semi-Custom Design Flow
Array-Based Implementation Approaches
8
...
1
8
...
2
Pre-diffused (or Mask-Programmable) Arrays
Pre-wired Arrays
8
...
7
Summary
8
...
1
Introduction
9
...
2
...
2
...
3
Resistive Parasitics
9
...
1
9
...
2
9
...
3
9
...
4
...
4
...
5
Table Of Contents
Inductance and Reliability— Voltage Drop
Inductance and Performance—Transmission Line Effects
Advanced Interconnect Techniques
9
...
1
9
...
2
Reduced-Swing Circuits
Current-Mode Transmission Techniques
9
...
7
Chapter Summary
9
...
1
Introduction
10
...
2
...
2
...
2
...
2
...
3
Synchronous Design — An In-depth Perspective
10
...
1
10
...
2
10
...
3
10
...
4
10
...
6
...
6
...
7
Self-Timed Logic - An Asynchronous Technique
Completion-Signal Generation
Self-Timed Signaling
Practical Examples of Self-Timed Logic
Synchronizers and Arbiters*
10
...
1
10
...
2
10
...
4
...
4
...
4
...
4
...
5
Mesochronous interconnect
Plesiochronous Interconnect
Asynchronous Interconnect9
Basic Concept
Building Blocks of a PLL
Future Directions and Perspectives
10
...
1
Distributed Clocking Using DLLs
DIGITAL INTEGRATED CIRCUITS
10
...
2
10
...
3
Optical Clock Distribution
Synchronous versus Asynchronous Design
10
...
9
To Probe Further
DESIGN METHODOLOGY INSERT G: DESIGN VERIFICATION
CHAPTER 11: DESIGNING ARITHMETIC BUILDING BLOCKS
11
...
2
Datapaths in Digital Processor Architectures
11
...
3
...
3
...
3
...
4
The Multiplier
11
...
1
11
...
2
11
...
3
11
...
4
11
...
5
11
...
5
...
5
...
6
Other Arithmetic Operators
11
...
7
...
7
...
7
...
8
Perspective: Design as a Trade-off
11
...
10 To Probe Further
8
9
Table Of Contents
CHAPTER 12: DESIGNING MEMORY AND ARRAY STRUCTURES
12
...
1
...
1
...
2
The Memory Core
12
...
1
12
...
2
12
...
3
12
...
4
12
...
5
...
5
...
5
...
5
...
5
...
6
The Address Decoders
Sense Amplifiers
Voltage References
Drivers/Buffers
Timing and Control
Memory Reliability and Yield
12
...
1
12
...
2
12
...
3
...
3
...
3
...
3
...
3
...
4
Memory Classification
Memory Architectures and Building Blocks
Sources of Power Dissipation in Memories
Partitioning of the memory
Addressing the Active Power Dissipation
Data-retention dissipation
Summary
Case Studies in Memory Design
12
...
1
12
...
2
12
...
3
The Programmable Logic Array (PLA)
A 4 Mbit SRAM
A 1 Gbit NAND Flash Memory
12
...
8
Summary
12
...
1
Introduction
H
...
3
Design for Testability
H
...
1
H
...
2
H
...
3
H
...
4
H
...
5
H
...
4
...
4
...
4
...
5
Fault Simulation
To Probe Further
INDEX
10
chapter1
...
1
A Historical Perspective
1
...
3
Quality Metrics of a Digital Design
1
...
5
To Probe Further
9
chapter1
...
1
INTRODUCTION
Chapter 1
A Historical Perspective
The concept of digital data manipulation has made a dramatic impact on our society
...
Evolving steadily from mainframe and minicomputers, personal and laptop computers have proliferated into daily life
...
Instrumentation was one of the first noncomputing domains where the
potential benefits of digital data manipulation over analog processing were recognized
...
Only recently have we witnessed the conversion of telecommunications and consumer electronics towards the digital format
...
The compact disk has revolutionized the audio world, and digital video
is following in its footsteps
...
In the early nineteenth century, Babbage envisioned largescale mechanical computing devices, called Difference Engines [Swade93]
...
The Analytical
Engine, developed in 1834, was perceived as a general-purpose computing machine, with
features strikingly close to modern computers
...
It even used pipelining to speed up the execution of the addition operation! Unfortunately, the complexity and the cost of the designs made the concept impractical
...
1)
required 25,000 mechanical parts at a total cost of £17,470 (in 1834!)
...
1 Working part of Babbage’s
Difference Engine I (1832), the first known
automatic calculator (from [Swade93],
courtesy of the Science Museum of
London)
...
fm Page 11 Friday, January 18, 2002 8:58 AM
Section 1
...
Early digital electronics
systems were based on magnetically controlled switches (or relays)
...
Examples of such are train
safety systems, where they are still being used at present
...
While originally
used almost exclusively for analog processing, it was realized early on that the vacuum
tube was useful for digital computations as well
...
The era of the vacuum tube based computer culminated in the design of machines such as
the ENIAC (intended for computing artillery firing tables) and the UNIVAC I (the first
successful commercial computer)
...
5 feet high and several feet wide and incorporated 18,000 vacuum
tubes
...
Reliability problems and excessive power consumption made the implementation of larger
engines economically and practically infeasible
...
It took till 1956 before this led to the first bipolar digital logic gate,
introduced by Harris [Harris56], and even more time before this translated into a set of
integrated-circuit commercial logic gates, called the Fairchild Micrologic family
[Norman60]
...
Other logic families were devised with higher performance in mind
...
TTL had the advantage, however, of offering a higher integration density and
was the basis of the first integrated circuit revolution
...
The family was so successful that it composed the
largest fraction of the digital semiconductor market until the 1980s
...
Although attempts were made to develop high
integration density, low-power bipolar families (such as I2L—Integrated Injection Logic
[Hart72]), the torch was gradually passed to the MOS digital integrated circuit approach
...
Lilienfeld (Canada) as early as 1925, and, independently, by O
...
Insufficient knowledge of the materials and gate stability problems, however, delayed the practical usability of the device for a long time
...
Remarkably, the first MOS logic gates introduced were of the CMOS variety
[Wanlass63], and this trend continued till the late 1960s
...
Instead,
1
An intriguing overview of the evolution of digital integrated circuits can be found in [Murphy93]
...
It is accompanied by some of the historically ground-breaking publications in the domain of digital IC’s
...
fm Page 12 Friday, January 18, 2002 8:58 AM
12
INTRODUCTION
Chapter 1
the first practical MOS integrated circuits were implemented in PMOS-only logic and
were used in applications such as calculators
...
These processors were
implemented in NMOS-only logic, which has the advantage of higher speed over the
PMOS logic
...
For instance, the first 4Kbit MOS memory was introduced in 1970 [Hoff70]
...
The road to the current levels of integration has not been without hindrances, however
...
This realization,
combined with progress in manufacturing technology, finally tilted the balance towards
the CMOS technology, and this is where we still are today
...
Although the large majority of the current integrated circuits are implemented in the
MOS technology, other technologies come into play when very high performance is at
stake
...
BiCMOS is used in high-speed memories and gate arrays
...
These technologies only play a very small role in the overall digital integrated circuit design scene
...
Hence the focus of this textbook on
CMOS only
...
2
Issues in Digital Integrated Circuit Design
Integration density and performance of integrated circuits have gone through an astounding revolution in the last couple of decades
...
This prediction,
later called Moore’s law, has proven to be amazingly visionary [Moore65]
...
Figure 1
...
As can be observed, integration complexity doubles approximately every 1 to 2 years
...
An intriguing case study is offered by the microprocessor
...
The transistor counts for a number of landmark designs are collected
in Figure 1
...
The million-transistor/chip barrier was crossed in the late eighties
...
This is illus-
chapter1
...
2
Issues in Digital Integrated Circuit Design
13
64 Gbits
*0
...
15µm
0
...
2µm
0
...
3µm
0
...
4µm
0
...
6µm
0
...
8µm
1
...
2µm
1
...
4µm
64 Kbits
Encyclopedia
Encyclopedia
2 hrs CD Audio
2 hrs CD Audio
30 sec HDTV
30 sec HDTV
Page
Page
1970
1980
1990
2000
2010
Year
(b) Trends in memory complexity
(a) Trends in logic IC complexity
Figure 1
...
trated in Figure 1
...
An important observation is that, as of now, these trends
have not shown any signs of a slow-down
...
Early designs were truly hand-crafted
...
This
is adequately illustrated in Figure 1
...
This approach is, obviously, not appropriate when more than a million devices
have to be created and assembled
...
100000000
Pentium 4
Pentium III
Pentium II
Transistors
10000000
Pentium ®
1000000
486
386
100000
286 ™
8086
10000
4004
1000
1970
8080
8008
1975
1980
1985
1990
1995
2000
Year of Introduction
Figure 1
...
chapter1
...
1
1970
286
386
8080
8008
4004
1980
1990
Year
2000
2010
Figure 1
...
Designers have, therefore, increasingly adhered to rigid design methodologies and strategies that are more amenable to design automation
...
5b
...
Cells are reused as much as possible to reduce the design effort
and to enhance the chances for a first-time-right implementation
...
The obvious next question is why such an approach is feasible in the digital world
and not (or to a lesser degree) in analog designs
...
At each design level,
the internal details of a complex module can be abstracted away and replaced by a black
box view or model
...
For instance, once a designer has implemented a
multiplier module, its performance can be defined very accurately and can be captured in a
model
...
For all purposes, it can hence be considered a black
box with known characteristics
...
The impact of
this divide and conquer approach is dramatic
...
This is analogous to a software designer using a library of software routines such as
input/output drivers
...
The only thing he cares about is the intended result of calling one of those
modules
...
chapter1
...
2
Issues in Digital Integrated Circuit Design
15
(a) The 4004 microprocessor
Standard Cell Module
Memory Module
(b) The Pentium ® 4 microprocessor
Figure 1
...
chapter1
...
g
...
g
...
6
...
6
+
n
+
Design abstraction levels in digital circuits
...
No circuit designer will ever seriously consider the solid-state
physics equations governing the behavior of the device when designing a digital gate
...
For instance, an AND gate is adequately described by its Boolean expression (Z = A
...
This design philosophy has been the enabler for the emergence of elaborate computer-aided design (CAD) frameworks for digital integrated circuits; without it the current
design complexity would not have been achievable
...
An
overview of these tools and design methodologies is given in Chapter 8 of this textbook
...
These libraries contain not only the layouts, but also provide complete documentation and characterization of the behavior of the cells
...
fm Page 17 Friday, January 18, 2002 8:58 AM
Section 1
...
5b)
...
In this approach, logic gates are placed in rows of cells of
equal height and interconnected using routing channels
...
The preceding analysis demonstrates that design automation and modular design
practices have effectively addressed some of the complexity issues incurred in contemporary digital design
...
If design automation
solves all our design problems, why should we be concerned with digital circuit design at
all? Will the next-generation digital designer ever have to worry about transistors or parasitics, or is the smallest design entity he will ever consider the gate and the module?
The truth is that the reality is more complex, and various reasons exist as to why an
insight into digital circuits and their intricacies will still be an important asset for a long
time to come
...
Semiconductor technologies continue to advance from year to year
...
• Creating an adequate model of a cell or module requires an in-depth understanding
of its internal operation
...
• The library-based approach works fine when the design constraints (speed, cost or
power) are not stringent
...
Unfortunately for a large number of other products such as microprocessors, success
hinges on high performance, and designers therefore tend to push technology to its
limits
...
To resort to our previous analogy to software methodologies, a programmer tends to “customize” software routines when execution speed is crucial; compilers—or design tools—are not yet to the level of what human sweat or ingenuity
can deliver
...
The performance of, for instance, an adder can be substantially influenced by the way it is connected to its environment
...
The impact of the interconnect parasitics is bound
to increase in the years to come with the scaling of the technology
...
Some design entities tend to be global or external (to resort anew to the software
analogy)
...
Increasing the size of a digital design has a
chapter1
...
For instance, connecting more cells to a supply line can cause a voltage drop over the wire, which, in its turn, can slow down all
the connected cells
...
Coping with them
requires a profound understanding of the intricacies of digital circuit design
...
A typical example of this is the periodical reemergence of
power dissipation as a constraining factor, as was already illustrated in the historical
overview
...
To cope with these unforeseen factors, one must at least be able to model
and analyze their impact, requiring once again a profound insight into circuit topology and behavior
...
A fabricated circuit does not always
exhibit the exact waveforms one might expect from advance simulations
...
Troubleshooting a design
requires circuit expertise
...
Even
though she might not have to deal with the details of the circuit on a daily basis, the understanding will help her to cope with unexpected circumstances and to determine the dominant effects when analyzing a design
...
1 Clocks Defy Hierarchy
To illustrate some of the issues raised above, let us examine the impact of deficiencies in one
of the most important global signals in a design, the clock
...
This task can be
compared to the function of a traffic light that determines which cars are allowed to move
...
Under ideal circumstances, the clock signal is a periodic step waveform with transitions synchronized
throughout the designed circuit (Figure 1
...
In light of our analogy, changes in the traffic
lights should be synchronized to maximize throughput while avoiding accidents
...
7b)
...
This is confirmed by the simulations shown in Figure
1
...
Due to delays associated with routing the clock wires, it may happen that the clocks
become misaligned with respect to each other
...
Consider the case that the clock signal for the second
register is delayed—or skewed—by a value δ
...
If the time it takes to propagate the
output of the first register to the input of the second is smaller than the clock delay, the latter
will sample the wrong value
...
fm Page 19 Friday, January 18, 2002 8:58 AM
Section 1
...
7
time
(c) Simulated waveforms
Impact of clock misalignment
...
In terms of our traffic analogy, cars of a first traffic light hit the cars of the
next light that have not left yet
...
Clock
skew is actually one of the most critical design problems facing the designers of large, highperformance systems
...
2 Power Distribution Networks Defy Hierarchy
While the clock signal is one example of a global signal that crosses the chip hierarchy
boundaries, the power distribution network represents another
...
To ensure proper operation, this
voltage should be stable within a few hundred millivolts
...
The
resistive nature of the on-chip wires and the inductance of the IC package pins make this a
difficult proposition
...
This leads to a current variation of 100 GA/sec, which is a truly
astounding number
...
A current of 1 A
running through a wire with a resistance of 1 Ω causes a voltage drop of 1V
...
2 and 2
...
fm Page 20 Friday, January 18, 2002 8:58 AM
20
INTRODUCTION
Block A
Block B
(a) Routing through the block
Block A
Chapter 1
Block B
(b) Routing around the block
Figure 1
...
able
...
While
this sizing of the power network is relatively simple in a flat design approach, it is a lot
more complex in a hierarchical design
...
8a [Saleh01]
...
If power is routed
through Block A to Block B, a larger IR drop will occur in Block B since power is also
being consumed by Block A before it reaches Block B
...
8b
...
Although routing power this way is easier to control and maintain, it also requires more area to implement
...
This requirement forces designers to set
aside area for power busing that takes away from the available routing area
...
For instance, it is not always easy to determine which
way the current will flow when multiple parallel paths are available between the power
source and the consuming gate
...
All these considerations make the design of the power-distribution a challenging job
...
The purpose of this textbook is to provide a bridge between the abstract vision of
digital design and the underlying digital circuit and its peculiarities
...
The persistent quest for a designer when designing each of the mentioned modules is to identify the dominant design parameters, to locate the section of the
design he should focus his optimizations on, and to determine the specific properties that
make the module under investigation (e
...
, a memory) different from any others
...
fm Page 21 Friday, January 18, 2002 8:58 AM
Section 1
...
1
...
These properties help to
quantify the quality of a design from different perspectives: cost, functionality, robustness,
performance, and energy consumption
...
For instance, pure speed is a crucial property in a compute
server
...
The introduced properties are relevant at all levels of the
design hierarchy, be it system, chip, module, and gate
...
1
...
1
Cost of an Integrated Circuit
The total cost of any product can be separated into two components: the recurring
expenses or the variable cost, and the non-recurring expenses or the fixed cost
...
An important component of the fixed cost of an integrated circuit is the effort in time and manpower it takes to produce the design
...
Advanced design methodologies that automate major parts of the design process
can help to boost the latter
...
Additionally, one has to account for the indirect costs, the company overhead that
cannot be billed directly to one product
...
Variable Cost
This accounts for the cost that is directly attributable to a manufactured product, and is
hence proportional to the product volume
...
The total cost of an integrated circuit is now
fixed cost
cost per IC = variable cost per IC + ----------------------
volume
(1
...
fm Page 22 Friday, January 18, 2002 8:58 AM
22
INTRODUCTION
Chapter 1
Individual die
Figure 1
...
Each
square represents a die - in this case
the AMD Duron™ microprocessor
(Reprinted with permission from AMD)
...
This also
explains why it makes sense to have large design team working for a number of years on a
hugely successful product such as a microprocessor
...
2)
As will be elaborated on in Chapter 2, the IC manufacturing process groups a number of
identical circuits onto a single wafer (Figure 1
...
Upon completion of the fabrication, the
wafer is chopped into dies, which are then individually packaged after being tested
...
The cost of packaging and test is the
topic of later chapters
...
The latter factor is called the die yield
...
3)
The number of dies per wafer is, in essence, the area of the wafer divided by the die
area
...
Dies around the perimeter of the wafer are therefore lost
...
Eq
...
3)
also presents the first indication that the cost of a circuit is dependent upon the chip
area—increasing the chip area simply means that less dies fit on a wafer
...
Both the substrate material and the manufacturing process introduce faults that
can cause a chip to fail
...
fm Page 23 Friday, January 18, 2002 8:58 AM
Section 1
...
4)
α is a parameter that depends upon the complexity of the manufacturing process, and is
roughly proportional to the number of masks
...
The defects per unit area is a measure of the material and process
induced faults
...
5 and 1 defects/cm2 is typical these days, but depends
strongly upon the maturity of the process
...
3 Die Yield
Assume a wafer size of 12 inch, a die size of 2
...
Determine the
die yield of this CMOS process run
...
2
dies per wafer = π × ( wafer diameter ⁄ 2 ) – π × wafer diameter
----------------------------------------------------------- --------------------------------------------die area
2 × die area
This means 252 (= 296 - 44) potentially operational dies for this particular example
...
(1
...
The bottom line is that the number of functional of dies per wafer, and hence the
cost per die is a strong function of the die area
...
Bearing in mind the
equations derived above and the typical parameter values, we can conclude that die costs
are proportional to the fourth power of the area:
cost of die = f ( die area )
4
(1
...
Small area is hence a desirable property for a digital gate
...
Smaller gates furthermore tend to be faster and consume less energy, as the total gate capacitance—which is
one of the dominant performance parameters—often scales with the area
...
Other parameters may have an impact, though
...
The gate complexity, as expressed by the number of transistors and the regularity of the interconnect structure, also has an impact on the design cost
...
Simplicity and regularity is a precious property in cost-sensitive designs
...
3
...
The measured behavior of a manufactured circuit normally deviates from the
chapter1
...
One reason for this aberration are the variations in the manufacturing
process
...
The electrical behavior of a circuit can be
profoundly affected by those variations
...
The word noise in the context
of digital circuits means “unwanted variations of voltages and currents at the logic
nodes
...
Some examples of digital noise
sources are depicted in Figure 1
...
For instance, two wires placed side by side in an integrated circuit form a coupling capacitor and a mutual inductance
...
Noise
on the power and ground rails of a gate also influences the signal levels in the gate
...
Capacitive and inductive cross talk, and the internally-generated
power supply noise are examples of such
...
For these
sources, the noise level is directly expressed in Volt or Ampere
...
Noise is a major concern in the engineering of digital circuits
...
VDD
v(t)
i(t)
(a) Inductive coupling
(b) Capacitive coupling
Figure 1
...
(c) Power and ground
noise
The steady-state parameters (also called the static behavior) of a gate measure how
robust the circuit is with respect to both variations in the manufacturing process and noise
disturbances
...
Digital circuits (DC) perform operations on logical (or Boolean) variables
...
e
...
6)
chapter1
...
3
Quality Metrics of a Digital Design
25
A logical variable is, however, a mathematical abstraction
...
This is most often a node
voltage that is not discrete but can adopt a continuous range of values
...
Applying VOH to the input of an inverter yields VOL at the output and vice
versa
...
V OH = ( V OL )
V OL = ( V OH )
(1
...
The electrical function of a gate is best expressed by its voltage-transfer
characteristic (VTC) (sometimes called the DC transfer characteristic), which plots the
output voltage as a function of the input voltage Vout = f(Vin)
...
11
...
Another point of interest of the
VTC is the gate or switching threshold voltage VM (not to be confused with the threshold
voltage of a transistor), that is defined as VM = f(VM)
...
The gate threshold voltage presents the midpoint of the switching characteristics, which is obtained when the output of a gate is short-circuited to the input
...
Vout
VOH
f
Vout = Vin
V
M
VOL
VOL
VOH
Vin
Figure 1
...
Even if an ideal nominal value is applied at the input of a gate, the output signal
often deviates from the expected nominal value
...
e
...
Figure 1
...
fm Page 26 Friday, January 18, 2002 8:58 AM
26
INTRODUCTION
Chapter 1
alone
...
These represent by definition the points where the gain
(= dVout / dVin) of the VTC equals −1 as shown in Figure 1
...
The region between VIH
and VIL is called the undefined region (sometimes also referred to as transition width, or
TW)
...
Noise Margins
For a gate to be robust and insensitive to noise disturbances, it is essential that the “0” and
“1” intervals be as large as possible
...
8)
NM H = V OH – V IH
The noise margins represent the levels of noise that can be sustained when gates are cascaded as illustrated in Figure 1
...
It is obvious that the margins should be larger than 0
for a digital circuit to be functional and by preference should be as large as possible
...
Assume that a signal is
disturbed by noise and differs from the nominal voltage levels
...
This deviation is added to the noise injected at
the output node and passed to the next gate
...
This, fortunately,
does not happen if the gate possesses the regenerative property, which ensures that a dis-
“1”
VOH
VIH
Vout
VOH
Slope = -1
Undefined
Region
“0”
VIL
VOL
(a) Relationship between voltage and logic levels
Figure 1
...
Slope = -1
VOL
VIL
VIH
(b) Definition of VIH and VIL
Vin
chapter1
...
3
Quality Metrics of a Digital Design
27
“1”
VOH
NMH
VIH
Undefined
region
VIL
NML
VOL
“0”
Gate output
Gate input
Stage M
Stage M + 1
Figure 1
...
turbed signal gradually converges back to one of the nominal voltage levels after passing
through a number of logical stages
...
14a)
...
Similarly,
when an input voltage vin (vin ∈ “1”) is applied to the inverter chain, the output voltage
will approach the nominal value VOH
...
14
The regenerative property
...
4 Regenerative property
The concept of regeneration is illustrated in Figure 1
...
The input signal to the chain is a step-waveform with
chapter1
...
Instead of swinging from rail to rail,
v0 only extends between 2
...
9 V
...
6 V to 4
...
Even further, v2 already swings between the nominal VOL and VOH
...
The conditions under which a gate is regenerative can be intuitively derived by analyzing a simple case study
...
15(a) plots the VTC of an inverter Vout = f(Vin) as well
as its inverse function finv(), which reverts the function of the x- and y-axis and is defined
as follows:
in = f ( out ) ⇒ in = finv ( out )
out
(1
...
15 Conditions for regeneration
...
The output voltage of this inverter equals v1 = f(v0) and is applied to
the next inverter
...
The signal voltage gradually converges to the nominal signal after a number of inverter stages, as indicated by the
arrows
...
15(b) the signal does not converge to any of the nominal voltage levels
but to an intermediate voltage level
...
The difference between the two cases is due to the gain characteristics of the gates
...
Such a gate has two stable operating points
...
Noise Immunity
While the noise margin is a meaningful means for measuring the robustness of a circuit
against noise, it is not sufficient
...
fm Page 29 Friday, January 18, 2002 8:58 AM
Section 1
...
Noise immunity, on the other hand, expresses the ability of the system to process and transmit information correctly in the presence of noise [Dally98]
...
These circuits have the property that only a small
fraction of a potentially-damaging noise source is coupled to the important circuit nodes
...
Circuits that do not posses this property are susceptible to noise
...
As discussed earlier, the noise sources
can be divided into sources that are
• proportional to the signal swing Vsw
...
• fixed
...
We assume, for the sake of simplicity, that the noise margin equals half the signal swing
(for both H and L)
...
V sw
V NM = ------- ≥
2
∑f V
i
Nfi
+
∑g V
j
sw
(1
...
11)
j
This makes it clear that the signal swing (and the noise margin) has to be large enough to
overpower the impact of the fixed sources (f VNf)
...
In the presence of large gain factors, increasing the
signal swing does not do any good to suppress noise, as the noise increases proportionally
...
Directivity
The directivity property requires a gate to be unidirectional, that is, changes in an output
level should not appear at any unchanging input of the same circuit
...
In real gate implementations, full directivity can never be achieved
...
Capacitive coupling between
inputs and outputs is a typical example of such a feedback
...
chapter1
...
16)
...
From the world of analog amplifiers, we know that this effect is minimized by making
the input resistance of the load gates as large as possible (minimizing the input currents)
and by keeping the output resistance of the driving gate small (reducing the effects of load
currents on the output voltage)
...
For these reasons, many generic and library
components define a maximum fan-out to guarantee that the static and dynamic performance of the element meet specification
...
16b)
...
M
N
(b) Fan-in M
(a) Fan-out N
Figure 1
...
The Ideal Digital Gate
Based on the above observations, we can define the ideal digital gate from a static perspective
...
Its VTC is shown in Figure 1
...
The input and output impedances of the
ideal gate are infinity and zero, respectively (i
...
, the gate has unlimited fan-out)
...
Example 1
...
18 shows an example of a voltage-transfer characteristic of an actual, but outdated
gate structure (as produced by SPICE in the DC analysis mode)
...
chapter1
...
3
Quality Metrics of a Digital Design
31
Vout
g = -∞
Figure 1
...
5 V;
VIH = 2
...
VOL = 0
...
66 V
VM = 1
...
15 V; NML = 0
...
05 V is substantially below the maximum obtainable value of 5 V (which is the value of the supply voltage for this design)
...
0
Vout (V)
4
...
0
2
...
0
0
...
0
2
...
0
Vin (V)
1
...
3
4
...
0
Figure 1
...
Performance
From a system designers perspective, the performance of a digital circuit expresses the
computational load that the circuit can manage
...
This performance
chapter1
...
While the
former is crucially important, it is not the focus of this text book
...
When focusing on the pure
design, performance is most often expressed by the duration of the clock period (clock
cycle time), or its rate (clock frequency)
...
Each of these topics will be discussed in detail on the course of this text book
...
The propagation delay tp of a gate defines how quickly it responds to a change at its
input(s)
...
It is
measured between the 50% transition points of the input and output waveforms, as shown
in Figure 1
...
2 Because a gate displays different response times for
rising or falling input waveforms, two definitions of the propagation delay are necessary
...
The propagation delay tp is
defined as the average of the two
...
12)
Vin
50%
t
Vout
tpHL
tpLH
90%
50%
10%
tf
t
tr
Figure 1
...
2
The 50% definition is inspired the assumption that the switching threshold VM is typically located in the
middle of the logic swing
...
fm Page 33 Friday, January 18, 2002 8:58 AM
Section 1
...
It is mostly used to compare different semiconductor technologies, or logic design styles
...
Most importantly, the delay is a function of the
slopes of the input and output signals of the gate
...
19), and express how fast a signal transits between
the different levels
...
The rise/fall time of a signal is largely determined by the
strength of the driving gate, and the load presented by the node itself, which sums the contributions of the connecting gates (fan-out) and the wiring parasitics
...
A uniform way of measuring the tp of a gate, so that technologies can be judged on an equal footing, is desirable
...
20)
...
The period T of the oscillation is
determined by the propagation time of a signal transition through the complete chain, or
T = 2 × tp × N with N the number of inverters in the chain
...
Note
that this equation is only valid for 2Ntp >> tf + tr
...
Typically, a ring oscillator needs a
least five stages to be operational
...
20
v2
v1
v3
v4
v5
Ring oscillator circuit for propagation-delay measurement
...
fm Page 34 Friday, January 18, 2002 8:58 AM
34
INTRODUCTION
Chapter 1
CAUTION: We must be extremely careful with results obtained from ring oscillator
measurements
...
The oscillator results are primarily useful for quantifying the differences between various manufacturing technologies and gate topologies
...
In more realistic digital circuits, fan-ins and fan-outs are higher, and
interconnect delays are non-negligible
...
As a result, the achievable clock frequency on
average is 50 to a 100 times slower than the frequency predicted from ring oscillator measurements
...
Example 1
...
21
...
R
vin
vout
C
Figure 1
...
When applying a step input (with vin going from 0 to V), the transient response of this
circuit is known to be an exponential function, and is given by the following expression
(where τ = RC, the time constant of the network):
v out(t) = (1 − e−t/τ) V
(1
...
69τ
...
2τ to get to the 90% point
...
1
...
4
Power and Energy Consumption
The power consumption of a design determines how much energy is consumed per operation, and much heat the circuit dissipates
...
Therefore, power dissipation is an important
property of a design that affects feasibility, cost, and reliability
...
With the increasing popularity of mobile and distributed computation, energy limitations put a firm restriction on the number of computations that can be performed given a minimum time between battery recharges
...
fm Page 35 Friday, January 18, 2002 8:58 AM
Section 1
...
For instance, the peak power Ppeak is important when studying supply-line
sizing
...
Both measures are defined in equation Eq
...
14):
P peak = i peak V supply = max [ p ( t ) ]
T
T
∫
∫
0
0
V supply
1
P av = -- p ( t )dt = --------------- i supply ( t )dt
T
T
(1
...
The dissipation can further be decomposed into static and dynamic components
...
It is attributed to the
charging of capacitors and temporary current paths between the supply rails, and is, therefore, proportional to the switching frequency: the higher the number of switching events,
the higher the dynamic power consumption
...
It is always present, even when the circuit is in
stand-by
...
The propagation delay and the power consumption of a gate are related—the propagation delay is mostly determined by the speed at which a given amount of energy can be
stored on the gate capacitors
...
For a given technology and gate topology, the product of
power consumption and propagation delay is generally a constant
...
The PDP is simply the energy consumed by the gate per switching
event
...
An ideal gate is one that is fast, and consumes little energy
...
From the above, it should be clear that the E-D is equivalent
to power-delay2
...
7 Energy Dissipation of First-Order RC Network
Let us consider again the first-order RC network shown in Figure 1
...
When applying a step
input (with Vin going from 0 to V), an amount of energy is provided by the signal source to the
network
...
15)
It is interesting to observe that the energy needed to charge a capacitor from 0 to V volt
with a step input is a function of the size of the voltage step and the capacitance, but is inde-
chapter1
...
We can also compute how much of the delivered energy
gets stored on the capacitor at the end of the transition
...
16)
0
This is exactly half of the energy delivered by the source
...
We leave it to the reader to demonstrate that during the discharge phase (for a step from V to 0), the energy originally stored on the capacitor
gets dissipated in the resistor as well, and turned into heat
...
4
Summary
In this introductory chapter, we learned about the history and the trends in digital circuit
design
...
At the
end of the Chapter, you can find an extensive list of reference works that may help you to
learn more about some of the topics introduced in the course of the text
...
5
To Probe Further
The design of digital integrated circuits has been the topic of a multitude of textbooks and
monographs
...
The state-of-the-art developments in the area
of digital design are generally reported in technical journals or conference proceedings,
the most important of which are listed
...
fm Page 37 Friday, January 18, 2002 8:58 AM
Section 1
...
Annaratone, Digital CMOS Circuit Design, Kluwer, 1986
...
Dillinger, VLSI Engineering, Prentice Hall, 1988
...
Elmasry, ed
...
E
...
, Digital MOS Integrated Circuits II, IEEE Press, 1992
...
Glasser and D
...
A
...
, McGraw-Hill, 1999
...
Mead and L
...
K
...
D
...
Eshraghian, Basic VLSI Design, Prentice Hall, 1988
...
Shoji, CMOS Digital Circuit Technology, Prentice Hall, 1988
...
Uyemura, Circuit Design for CMOS VLSI, Kluwer, 1992
...
Veendrick, MOS IC’s: From Basics to ASICS, VCH, 1992
...
High-Performance Design
K
...
A
...
Fox, and W
...
, Design of High-Performance Microprocessor Circuits, IEEE Press, 2000
...
Shoji, High-Speed Digital Circuits, Addison-Wesley, 1996
...
Chandrakasan and R
...
, Low-Power Digital CMOS Design, IEEE Press, 1998
...
Rabaey and M
...
, Low-Power Design Methodologies, Kluwer Academic, 1996
...
Yeap, Practical Low-Power CMOS Design, Kluwer Academic, 1998
...
Itoh, VLSI Memory Chip Design, Springer, 2001
...
Prince, Semiconductor Memories, Wiley, 1991
...
Prince, High Performance Memories, Wiley, 1996
...
Hodges, Semiconductor Memories, IEEE Press, 1972
...
Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley, 1990
...
Dally and J
...
E
...
, Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, 1995
...
Lau et al, ed
...
Design Tools and Methodologies
V
...
Seth, Test Generation for VLSI Chips, IEEE Press, 1988
...
fm Page 38 Friday, January 18, 2002 8:58 AM
38
INTRODUCTION
Chapter 1
D
...
G
...
S
...
J
...
A
...
W
...
Bipolar and BiCMOS
A
...
M
...
, BiCMOS Integrated Circuit Design, IEEE Press, 1994
...
Embabi, A
...
Elmasry, Digital BiCMOS Integrated Circuit Design, Kluwer,
1993
...
, eds
...
General
J
...
H
...
D
...
Jackson, Analysis and Design of Digital Integrated Circuits, 2nd ed
...
M
...
R
...
Watts, Submicron Integrated Circuits, Wiley, 1989
...
Bardeen and W
...
Rev
...
74, p
...
[Beeson62] R
...
Ruegg, “New Forms of All Transistor Logic,” ISSCC Digest of Technical Papers, pp
...
1962
...
Dally, Digital Systems Engineering, Cambridge University Press, 1998
...
Faggin, M
...
Hoff, Jr, H
...
Mazor, M
...
1-6
...
Harris, “Direct-Coupled Transistor Logic Circuitry in Digital Computers,” ISSCC
Digest of Technical Papers, p
...
1956
...
Hart and M
...
92–93, Feb
...
[Hoff70] E
...
68–73, August 3, 1970
...
intel
...
htm
[Masaki74] A
...
Harada and T
...
62–63, Feb
...
[Moore65] G
...
38,
Nr 8, April 1965
...
fm Page 39 Friday, January 18, 2002 8:58 AM
Section 1
...
Murphy, “Perspectives on Logic and Microprocessors,” Commemorative Supplement to the Digest of Technical Papers, ISSCC Conf
...
49–51, San Francisco, 1993
...
Norman, J
...
Haas, “Solid-State Micrologic Elements,” ISSCC Digest of
Technical Papers, pp
...
1960
...
Hennessy and D
...
Saleh, M
...
simplex
...
php?page_name=wp_powerplan
[Schockley49] W
...
28, p
...
[Shima74] M
...
Faggin and S
...
56–57, Feb
...
[Swade93] D
...
86–91, February 1993
...
Wanlass, and C
...
32–32, Feb
...
1
...
2
...
4
...
[E, None, 1
...
Determine also
how much DRAM should be available on a single chip at that point in time, if Moore’s law
would still hold
...
2]
Visit
the
Intel
on-line
microprocessor
museum
(http://www
...
com/intel/intelis/museum/exhibit/hist_micro/index
...
While browsing
through the microprocessor hall-of-fame, determine the rate of increase in transistor counts
and clock frequencies in the 70’s, 80’s, and 90’s
...
Spend some time browsing the site
...
[D, None, 1
...
Determine for
each of those, the number of integrated devices, the overall area and the maximum clock
speed
...
2
...
2] Find in the library the latest November issue of the Journal of Solid State Circuits
...
),
the minimum feature size, the number of devices on a single die, and the maximum clock
speed
...
[E, None, 1
...
6
...
fm Page 40 Friday, January 18, 2002 8:58 AM
40
INTRODUCTION
Chapter 1
chapter2
...
1
Introduction
2
...
5
...
2
...
5
...
2
...
2
...
2
...
3
2
...
4
2
...
4
...
4
...
4
...
fm Page 42 Friday, January 18, 2002 8:59 AM
42
2
...
Yet, some insight in the steps
that lead to an operational silicon chip comes in quite handy in understanding the physical
constraints that are imposed on a designer of an integrated circuit, as well as the impact of
the fabrication process on issues such as cost
...
It is not our aim to present a detailed description of
the fabrication technology, which easily deserves a complete course [Plummer00]
...
We learn that a set of optical masks forms the central interface between the
intrinsics of the manufacturing process and the design that the user wants to see transferred to the silicon fabric
...
As such, these patterns have to adhere to some constraints
in terms of minimum width and separation if the resulting circuit is to be fully functional
...
If the designer adheres to these rules, he gets
a guarantee that his circuit will be manufacturable
...
Finally, an overview is
given of the IC packaging options
...
2
...
1
...
To accommodate both types of devices, special regions called
wells must be created in which the semiconductor material is opposite to the type of the
channel
...
The cross section
Figure 2
...
chapter2
...
2
Manufacturing CMOS Integrated Circuits
43
shown in Figure 2
...
Modern processes are increasingly using a dual-well approach that uses both n- and pwells, grown on top on a epitaxial layer, as shown in Figure 2
...
We will restrict the
remainder of this discussion to the latter process (without loss of generality)
...
2Cross section of modern dual-well CMOS process
...
A number of these steps and/or operations are executed very
repetitively in the course of the manufacturing process
...
2
...
1
The Silicon Wafer
The base material for the manufacturing process comes
in the form of a single-crystalline, lightly doped wafer
...
3)
...
Often, the surface of the
wafer is doped more heavily, and a single crystal epitaxial layer of the opposite type is grown over the surface before the wafers are handed to the processing
company
...
High defect densities lead to a larger
fraction of non-functional circuits, and consequently an
increase in cost of the final product
...
3 Single-crystal ingot and
sliced wafers (from [Fullman99])
...
fm Page 44 Friday, January 18, 2002 8:59 AM
44
THE MANUFACTURING PROCESS
2
...
2
Chapter 2
Photolithography
In each processing step, a certain area on the chip is masked out using the appropriate optical mask so that a desired processing step can be selectively applied to the remaining
regions
...
The technique to accomplish
this selective masking, called photolithography, is applied throughout the manufacturing
process
...
4 gives a graphical overview of the different operations involved in a
typical photolitographic process
...
4 Typical operations in a single
photolithographic cycle (from [Fullman99])
...
Oxidation layering — this optional step deposits a thin layer of SiO2 over the complete wafer by exposing it to a mixture of high-purity oxygen and hydrogen at
approximately 1000°C
...
2
...
This material is
originally soluble in an organic solvent, but has the property that the polymers cross-
chapter2
...
2
Manufacturing CMOS Integrated Circuits
45
link when exposed to light, making the affected regions insoluble
...
A positive photoresist has the opposite properties; originally insoluble, but soluble after exposure
...
Since the cost of a mask is increasing quite rapidly
with the scaling of technology, a reduction of the number of masks is surely of high
priority
...
Stepper exposure — a glass mask (or reticle), containing the patterns that we want to
transfer to the silicon, is brought in close proximity to the wafer
...
The glass mask can be thought of as the negative of one
layer of the microcircuit
...
Where the mask is transparent, the photoresist becomes insoluble
...
Photoresist development and bake — the wafers are developed in either an acid or
base solution to remove the non-exposed areas of photoresist
...
5
...
This is accomplished through the use of many different
types of acid, base and caustic solutions as a function of the material that is to be
removed
...
Because of the dangerous nature of
some of these solvents, safety and environmental impact is a primary concern
...
Spin, rinse, and dry — a special tool (called SRD) cleans the wafer with deionized
water and dries it with nitrogen
...
To prevent this from happening, the processing steps are performed in ultra-clean
rooms where the number of dust particles per cubic foot of air ranges between 1 and
10
...
This
explains why the cost of a state-of-the-art fabrication facility easily ranges in the
multiple billions of dollars
...
7
...
These
are the subjects of the subsequent section
...
Photoresist removal (or ashing) — a high-temperature plasma is used to selectively
remove the remaining photoresist without damaging device layers
...
5
...
Yet, the reader has to bear in mind that same sequence patterns the layer of the complete surface of the wafer
...
fm Page 46 Friday, January 18, 2002 8:59 AM
46
THE MANUFACTURING PROCESS
Chapter 2
Chemical or plasma
etch
Si-substrate
Hardened resist
SiO2
(a) Silicon base material
Si-substrate
Photoresist
SiO2
Si-substrate
(d) After development and etching of resist,
chemical or plasma etch of SiO2
Hardened resist
SiO2
(b) After oxidation and deposition
of negative photoresist
Si-substrate
UV-light
Patterned
optical mask
(e) After etching
Exposed resist
Si-substrate
(c) Stepper exposure
SiO2
Si-substrate
(f) Final result after removal of resist
Figure 2
...
millions of patterns to the semiconductor surface simultaneously
...
The continued scaling of the minimum feature sizes in integrated circuits puts an
enormous burden on the developer of semiconductor manufacturing equipment
...
The dimensions of the features to be
transcribed surpass the wavelengths of the optical light sources, so that achieving the necessary resolution and accuracy becomes harder and harder
...
1 µm) process
generation
...
This adds substantially to the cost of mask making
...
These techniques, while fully functional, are currently less attractive from an economic viewpoint
...
fm Page 47 Friday, January 18, 2002 8:59 AM
Section 2
...
2
...
The creation of the source and drain
regions, well and substrate contacts, the doping of the polysilicon, and the adjustments of
the device threshold are examples of such
...
In both techniques, the area to be doped is
exposed, while the rest of the wafer is coated with a layer of buffer material, typically
SiO2
...
A gas containing the dopant is introduced in the tube
...
The final dopant concentration is the
greatest at the surface and decreases in a gaussian profile deeper in the material
...
The ion
implantation system directs and sweeps a beam of purified ions over the semiconductor
surface
...
The ion implantation
method allows for an independent control of depth and dosage
...
Ion implantation has some unfortunate side effects however, the most important one
being lattice damage
...
This problem is largely
resolved by applying a subsequent annealing step, in which the wafer is heated to around
1000°C for 15 to 30 minutes, and then allowed to cool slowly
...
Deposition
Any CMOS process requires the repetitive deposition of layers of a material over
the complete wafer, to either act as buffers for a processing step, or as insulating or conducting layers
...
Other materials require different techniques
...
This silicon nitride is deposited everywhere using a process called chemical vapor deposition or CVD, which uses a gas-phase
reaction with energy supplied by heat at around 850°C
...
The resulting reaction produces a non-crystalline or amorphous material
called polysilicon
...
The Aluminum interconnect layers are typically deployed using a process known as
sputtering
...
fm Page 48 Friday, January 18, 2002 8:59 AM
48
THE MANUFACTURING PROCESS
Chapter 2
delivered by electron-beam or ion-beam bombarding
...
Etching
Once a material has been deposited, etching is used to selectively form patterns such
as wires and contact holes
...
For instance, hydrofluoric acid buffered with ammonium fluoride is typically used to etch SiO2
...
A wafer is placed
into the etch tool's processing chamber and given a negative electrical charge
...
5 Pa, then filled with a positively charged plasma (usually a mix of nitrogen, chlorine and boron trichloride)
...
Plasma etching has the advantage of offering
a well-defined directionality to the etching action, creating patterns with sharp vertical
contours
...
If no special steps were taken, this would definitely
not be the case in modern CMOS processes, where multiple patterned metal interconnect
layers are superimposed onto each other
...
This process uses a slurry compound—a liquid carrier with a suspended
abrasive component such as aluminum oxide or silica—to microscopically plane a device
layer and to reduce the step heights
...
2
...
6
...
All other areas of the die will be covered with a thick layer of silicon dioxide
(SiO2), called the field oxide
...
1), or deposited in etched
trenches (Figure 2
...
Further insulation is provided
by the addition of a reverse-biased np-diode, formed by adding an extra p+ region, called
the channel-stop implant (or field implant) underneath the field oxide
...
To construct an NMOS transistor in a
p-well, heavily doped n-type source and drain regions are implanted (or diffused) into the
lightly doped p-type substrate
...
The conductive material forms the gate of the transistor
...
fm Page 49 Friday, January 18, 2002 8:59 AM
Section 2
...
6 Simplified process sequence for the manufacturing of a ndual-well CMOS circuit
...
Multiple insulated layers of metallic (most often Aluminum) wires are deposited on
top of these devices to provide for the necessary interconnections between the transistors
...
7
...
The process starts with a p-substrate surfaced with a lightly doped p-epitaxial layer (a)
...
A plasma etching step using the complementary of the
active area mask creates the trenches, used for insulating the devices (c)
...
At that point, the sacrificial nitride is removed (d)
...
This is followed by a second implant step to adjust the threshold voltages of the
PMOS transistors
...
Similar operations (using other dopants) are performed to create the p-wells,
and to adjust the thresholds of the NMOS transistors (f)
...
Polysilicon is
used both as gate electrode material for the transistors as well as an interconnect medium
(g)
...
The same implants are also used to dope
1
Most modern processes also include extra implants for the creation of the lightly-doped drain regions
(LDD), and the creation of gate spacers at this point
...
chapter2
...
fm Page 51 Friday, January 18, 2002 8:59 AM
Section 2
...
These
steps also dope the polysilicon
...
Al
(j) After deposition and
patterning of first Al layer
...
Figure 2
...
Be
aware that the drawings are stylized for understanding, and that the aspects ratios are not proportioned to reality
...
fm Page 52 Friday, January 18, 2002 8:59 AM
52
THE MANUFACTURING PROCESS
Chapter 2
the polysilicon on the surface, reducing its resistivity
...
Note that the polysilicon gate, which is patterned before the doping, actually defines the precise location of the channel region, and hence the location of the source
and drain regions
...
The process continues
with the deposition of the metallic interconnect layers
...
Intermediate planarization steps ensure that the surface remains reasonable flat, even in the
presence of multiple interconnect layers
...
This layer would be CVD SiO2,
although often an additional layer of nitride is deposited as it is more impervious to moisture
...
A cross-section of the final artifact is shown in Figure 2
...
Observe how the transistors occupy only a small fraction of the total height of the structure
...
transistor
Figure 2
...
2
...
The goal of defining a set of design rules is to allow for a ready translation
of a circuit concept into an actual geometry in silicon
...
Circuit designers in general want tighter, smaller designs, which lead to higher performance and higher circuit density
...
Design rules are, consequently, a compromise that
attempts to satisfy both sides
...
fm Page 53 Friday, January 18, 2002 8:59 AM
Section 2
...
They consist of minimum-width and minimum-spacing
constraints and requirements between objects on the same or on different layers
...
It stands for the minimum mask dimension that can be safely transferred to the
semiconductor material
...
More advanced
approaches use electron-beam
...
Even for the same minimum dimension, design rules tend to differ from company to
company, and from process to process
...
One approach to address this issue is to use
advanced CAD techniques, which allow for migration between compatible processes
...
The latter approach, made popular by
Mead and Conway [Mead80], defines all rules as a function of a single parameter, most
often called λ
...
Scaling of the minimum dimension is accomplished by simply
changing the value of λ
...
For a given process, λ is set to a specific value, and all design dimensions are consequently translated into
absolute numbers
...
For
instance, for a 0
...
e
...
25 µm), λ
equals 0
...
This approach, while attractive, suffers from some disadvantages:
1
...
25 µm and 0
...
When scaling over larger ranges, the relations
between the different layers tend to vary in a nonlinear way that cannot be adequately covered by the linear scaling rules
...
Scalable design rules are conservative
...
This
results in over-dimensioned and less-dense designs
...
2 As circuit density is a prime goal in industrial designs, most semiconductor companies tend to use
micron rules, which express the design rules in absolute dimensions and can therefore
exploit the features of a given process to a maximum degree
...
For this textbook, we have selected a “vanilla” 0
...
The rest of this section is devoted to a short introduction
and overview of the design rules of this process, which fall in the micron-rules class
...
chapter2
...
We
discuss each of them in sequence
...
From a designer’s viewpoint, all CMOS designs are based on the following entities:
• Substrates and/or wells, being p-type (for NMOS devices) and n-type (for PMOS)
• Diffusion regions (n+ and p+) defining the areas where transistors can be formed
...
Diffusions of an inverse type are
needed to implement contacts to the wells or to the substrate
...
• One or more polysilicon layers, which are used to form the gate electrodes of the
transistors (but serve as interconnect layers as well)
...
• Contact and via layers to provide interlayer connections
...
The functionality of the circuit is determined by the choice of the layers, as well as
the interplay between objects on different layers
...
An interconnection between two metal layers is formed by a cross section between the two metal layers and an additional contact layer
...
The different layers used in our CMOS process are represented in Colorplate 1 (color insert)
...
All distances are expressed in µm
...
Interlayer Constraints
Interlayer rules tend to be more complex
...
Understanding layout requires the
capability of translating the two-dimensional picture of the layout drawing into the threedimensional reality of the actual device
...
We present these rules in a set of separate groupings
...
Transistor Rules (Colorplate 3)
...
From the intralayer design rules, it is already clear that
the minimum length of a transistor equals 0
...
3 µm (the minimum width of diffusion)
...
fm Page 55 Friday, January 18, 2002 8:59 AM
Section 2
...
2
...
A contact (which forms an interconnection between metal and active or polysilicon) or a via (which connects two metal
layers) is formed by overlapping the two interconnecting layers and providing a
contact hole, filled with metal, between the two
...
3 µm, while the polysilicon and diffusion layers have to extend
at least over 0
...
This sets the minimum
area of a contact to 0
...
44 µm
...
The figure, furthermore, points out the minimum spacings
between contact and via holes, as well as their relationship with the surrounding layers
...
For robust digital circuit design, it is
important for the well and substrate regions to be adequately connected to the supply voltages
...
It is therefore advisable to provide numerous substrate (well) contacts spread over
the complete region
...
This is enabled by
the select layer, which reverses the type of diffusion
...
Consider an n-well process, which implements the PMOS transistors into an n-type
well diffused in a p-type material
...
To invert the polarity of the
diffusion, an n-select layer is provided that helps to establish the n+ diffusions for the wellcontacts in the n-region as well as the n+ source and drain regions for the NMOS transistors in the substrate
...
Failing to do so will almost surely lead to a nonfunctional design
...
While design teams in
the past used to spend numerous hours staring at room-size layout plots, most of this task
is now done by computers
...
A number of layout tools even perform on-line DRC and check the design in the background during the
time of conception
...
1 Layout Example
An example of a complete layout containing an inverter is shown in Figure 2
...
To help
the visualization process, a vertical cross section of the process along the design center is
included as well as a circuit schematic
...
fm Page 56 Friday, January 18, 2002 8:59 AM
56
THE MANUFACTURING PROCESS
Chapter 2
VDD
GND
In
A′
A
Out
(a) Layout
A′
A
n
p-substrate
n+
Field
oxide
p+
(b) Cross section along A-A′
In
VDD
GND
Out
(c) Circuit diagram
Figure 2
...
It is left as an exercise for the reader to determine the sizes of both the NMOS and
the PMOS transistors
...
4
Packaging Integrated Circuits
The IC package plays a fundamental role in the operation and performance of a component
...
Finally, its also protects the die against environmental conditions such as humidity
...
This influence is getting more
pronounced as time progresses by the reduction in internal signal delays and on-chip
capacitance as a result of technology scaling
...
chapter2
...
4
Packaging Integrated Circuits
57
The search for higher-performance packages with fewer inductive or capacitive parasitics
has accelerated in recent years
...
This relationship
was first observed by E
...
This formula relates the number of input/output pins to the complexity of the circuit, as measured by the number of
gates
...
1)
where K is the average number of I/Os per gate, G the number of gates, β the Rent exponent, and P the number of I/O pins to the chip
...
1 and 0
...
Its value
depends strongly upon the application area, architecture, and organization of the circuit, as
demonstrated in Table 2
...
Clearly, microprocessors display a very different input/output
behavior compared to memories
...
1
Rent’s constant for various classes of systems ([Bakoglu90])
Application
β
K
Static memory
0
...
45
0
...
5
1
...
63
1
...
25
82
The observed rate of pin-count increase for integrated circuits varies between 8% to
11% per year, and it has been projected that packages with more than 2000 pins will be
required by the year 2010
...
It is useful for the circuit designer to be
aware of the available options, and their pros and cons
...
• Electrical requirements—Pins should exhibit low capacitance (both interwire and
to the substrate), resistance, and inductance
...
Observe that intrinsic integrated-circuit impedances are high
...
Mechanical reliability requires a good matching between the thermal properties of the die and the chip carrier
...
chapter2
...
While ceramics
have a superior performance over plastic packages, they are also substantially more
expensive
...
The least expensive plastic packaging can dissipate up to 1 W
...
Higher dissipation requires more expensive ceramic packaging
...
Even more extreme techniques
such as fans and blowers, liquid cooling hardware, or heat pipes, are needed for
higher dissipation levels
...
The increasing pin count
either requires an increase in the package size or a reduction in the pitch between the
pins
...
Packages can be classified in many different ways —by their main material, the
number of interconnection levels, and the means used to remove heat
...
2
...
1
Package Materials
The most common materials used for the package body are ceramic and polymers (plastics)
...
For instance, the ceramic Al2O3 (Alumina) conducts heat better than
SiO2 and the Polyimide plastic, by factors of 30 and 100 respectively
...
The disadvantage of alumina and other ceramics is their high dielectric constant, which
results in large interconnect capacitances
...
4
...
The die is
first attached to an individual chip carrier or substrate
...
These cavities provide ample room for many
connections to the chip leads (or pins)
...
Complex systems contain even more interconnect levels, since boards are connected
together using backplanes or ribbon cables
...
10
...
Interconnect Level 1 —Die-to-Package-Substrate
For a long time, wire bonding was the technique of choice to provide an electrical connection between die and package
...
Next, the chip pads are individually
connected to the lead frame with aluminum or gold wires
...
fm Page 59 Friday, January 18, 2002 8:59 AM
Section 2
...
10 Interconnect hierarchy in traditional IC
packaging
...
An example of wire bonding is
shown in Figure 2
...
Although the wire-bonding process is automated to a large degree,
it has some major disadvantages
...
11
Wire bonding
...
Wires must be attached serially, one after the other
...
2
...
Bonding wires have inferior electrical properties, such as a high individual inductance (5
nH or more) and mutual inductance with neighboring signals
...
Typical values of the parasitic inductances and
capacitances for a number of commonly used packages are summarized in Table 2
...
3
...
New attachment techniques are being explored as a result of these deficiencies
...
12a)
...
12b)
...
One possible approach is to use pressure connectors
...
fm Page 60 Friday, January 18, 2002 8:59 AM
60
THE MANUFACTURING PROCESS
Sprocket
hole
Film + Pattern
Chapter 2
Solder bump
Die
Test
pads
Substrate
Lead
frame
(b) Die attachment using solder bumps
Polymer film
(a) Polymer tape with imprinted wiring pattern
Figure 2
...
Table 2
...
Capacitance
(pF)
Inductance
(nH)
68-pin plastic DIP
4
35
68-pin ceramic DIP
7
20
256-pin grid array
1–5
2–15
Wire bond
0
...
1–0
...
01–0
...
The sprockets in
the film are used for automatic transport
...
The
printed approach helps to reduce the wiring pitch, which results in higher lead counts
...
For instance,
for a two-conductor layer, 48 mm TAB Circuit, the following electrical parameters hold: L
≈ 0
...
5 nH, C ≈ 0
...
3 pF, and R ≈ 50–200 Ω [Doane93, p
...
Another approach is to flip the die upside-down and attach it directly to the substrate
using solder bumps
...
13)
...
This can help address
the power- and clock-distribution problems, since the interconnect materials on the substrate (e
...
, Cu or Au) are typically of a better quality than the Al on the chip
...
A PC board is manufactured by stacking layers of copper and insu-
chapter2
...
4
Packaging Integrated Circuits
61
Die
Solder bumps
Interconnect
layers
Figure 2
...
lating epoxy glass
...
The package pins are inserted and electrical connection is
made with solder (Figure 2
...
The favored package in this class was the dual-in-line
package or DIP (Figure 2
...
The packaging density of the DIP degrades rapidly when
the number of pins exceeds 64
...
15b)
...
(a) Through-hole mounting
(b) Surface mount
Figure 2
...
The through-hole mounting approach offers a mechanically reliable and sturdy connection
...
For mechanical reasons, a minimum pitch of 2
...
Even under
those circumstances, PGAs with large numbers of pins tend to substantially weaken the
board
...
PGAs with large pin counts hence require extra routing layers to connect to the multitudes
of pins
...
Many of the shortcomings of the through-hole mounting are solved by using the
surface-mount technique
...
14b)
...
In addition, the elimination of the through-holes improves the mechanical strength
of the board
...
Not only is it cumbersome to mount a component on a board, but also
more expensive equipment is needed, since a simple soldering iron will not do anymore
...
Signal probing becomes hard or even impossible
...
Three of these packages are shown in Figure 2
...
fm Page 62 Friday, January 18, 2002 8:59 AM
62
THE MANUFACTURING PROCESS
Chapter 2
2
7
5
1
3
Bare die
DIP
PGA
Small-outline IC
Quad flat pack
PLCC
Leadless carrier
4
6
Figure 2
...
leadless chip carrier
...
3
...
3
Parameters of various types of chip carriers
...
54 mm
64
Pin grid array
2
...
27 mm
28
Leaded chip carrier (PLCC)
1
...
75 mm
124
Even surface-mount packaging is unable to satisfy the quest for evermore higher
pin-counts
...
An example of such a
packaging approach, called ceramic ball grid array (BGA), is shown in Figure 2
...
Solder bumps are used to connect both the die to the package substrate, and the package to the
board
...
A minimum pitch between solder balls
of as low as 0
...
chapter2
...
4
Packaging Integrated Circuits
Lid
Thermal
grease
Chip
63
Flip-chip
solder joints
Ceramic base
Board
(a)
Solder ball
(b)
Figure 2
...
The trend is toward reducing the number of levels
...
Eliminating one layer in the packaging hierarchy by mounting the die
directly on the wiring backplanes—board or substrate—offers a substantial benefit when
performance or density is a major issue
...
A number of the previously mentioned die-mounting techniques can be adapted to
mount dies directly on the substrate
...
The substrate itself can vary over a wide range of
materials, depending upon the required mechanical, electrical, thermal, and economical
requirements
...
Silicon has the advantage of presenting a perfect match in mechanical and thermal properties with respect to the die material
...
An example of an MCM module implemented using a silicon substrate
(commonly dubbed silicon-on-silicon) is shown in Figure 2
...
The module, which implements an avionics processor module and is fabricated by Rockwell International, contains
53 ICs and 40 discrete devices on a 2
...
2″ substrate with aluminum polyimide interconnect
...
The module itself has 180
I/O pins
...
For instance, a solder bump has an assorted capacitance and inductance of only 0
...
01 nH respectively
...
chapter2
...
17
Chapter 2
Avionics processor module
...
The dynamic power associated with the switching of the large load capacitances is simultaneously reduced
...
This technology requires some advanced manufacturing steps that make the process expensive
...
In the near future, this argument might become obsolete as
MCM approaches proliferate
...
4
...
A large number of failure mechanisms in ICs are accentuated by increased temperatures
...
To prevent failure, the temperature of the die must be kept within certain ranges
...
Military parts are more demanding and require a temperature range varying from –55° to 125°C
...
Standard packaging approaches use still or circulating air as the cooling medium
...
More expensive packaging approaches, such as those used in mainframes or supercomput-
chapter2
...
5
Perspective — Trends in Process Technology
65
ers, force air, liquids, or inert gases through tiny ducts in the package to achieve even
greater cooling efficiencies
...
>
As an example, a 40-pin DIP has a thermal resistance of 38 °C/W and 25 °C/W for
natural and forced convection of air
...
For comparison, the thermal resistance
of a ceramic PGA ranges from 15 ° to 30 °C/W
...
The increasing integration levels and circuit performance make this task
nontrivial
...
It provides a bound on the integration complexity and performance as a function of the thermal parameters
N G ∆T
------ ≤ -----t p θE
(2
...
Example 2
...
5 °C/W and E = 0
...
In
other words, the maximum number of gates on a chip, when all gates are operating simultaneously, must be less than 400,000 if the switching speed of each gate is 1 nsec
...
Fortunately, not all gates are operating simultaneously in real systems
...
For
instance, it was experimentally derived that the ratio between the average switching period
and the propagation delay ranges from 20 to 200 in mini- and large-scale computers
[Masaki92]
...
(2
...
Design approaches for low power
that reduce either E or the activity factor are rapidly gaining importance
...
5
Perspective — Trends in Process Technology
Modern CMOS processes pretty much track the flow described in the previous sections
although a number of the steps might be reversed, a single well approach might be followed, a grown field oxide instead of the trench approach might be used, or extra steps
such as LDD (Lightly Doped Drain) might be introduced
...
2)
...
fm Page 66 Friday, January 18, 2002 8:59 AM
66
THE MANUFACTURING PROCESS
Chapter 2
inserted between steps i and j of our process
...
Beyond these, it is our belief that no dramatic changes, breaking away from the
described CMOS technology, must be expected in the next decade
...
5
...
Process engineers are continuously evaluating alternative
options for the traditional ‘Aluminum conductor—SiO2 insulator’ combination that has
been the norm for the last decades
...
Copper has the advantage of have a resistivity that is substantially lower than Aluminum
...
Coating the copper with a buffer material such as Titanium Nitride, preventing the diffusion, addresses this problem, but
requires a special deposition process
...
18) uses a metallization approach that fills trenches etched into the insulator, followed by a chemical-mechanical polishing step
...
In addition to the lower resistivity interconnections, insulator materials with a lower
dielectric constant than SiO2 —and hence lower capacitance— have also found their way
into the production process starting with the 0
...
(a)
(b)
Figure 2
...
fm Page 67 Friday, January 18, 2002 8:59 AM
Section 2
...
The main difference lies in
the start material: the transistors are constructed in a very thin layer of silicon, deposited
on top of a thick layer of insulating SiO2 (Figure 2
...
The primary advantages of the SOI
process are reduced parasitics and better transistor on-off characteristics
...
Preparing a high quality SOI substrate at an economical cost was long the main
hindrance against a large-scale introduction of the process
...
Gate
tox
oxide
n+
tSi
p
Buried Oxide (BOX)
n+
oxide
tBox
p-substrate
(a)
(b)
Figure 2
...
2
...
2
In the Longer Term
Extending the life of CMOS technology beyond the next decade, and deeply below the
100 nm channel length region however will require re-engineering of both the process
technology and the device structure
...
While projecting what approaches will dominate in that era equals resorting to
crystal-ball gazing, one interesting development is worth mentioning
...
One way to
address this problem is to introduce extra active layers, and to sandwich them in-between
the metal interconnect layers (Figure 2
...
This enables us to position high density memory on top of the logic processors implemented in the bulk CMOS, reducing the distance
between computation and storage, and hence also the delay [Souri00]
...
fm Page 68 Friday, January 18, 2002 8:59 AM
68
THE MANUFACTURING PROCESS
Chapter 2
different layers
...
optical device
T3 - Optical I/O; MEMS
M6
M5
M4
M3
n+/p+
n+/p+
T2 - High Density Memory
M2
M1
n+/p+
n+/p+
Bulk
T1 - Logic
Figure 2
...
Extra
active layers (T*), implementing high density memory
and I/O, are sandwiched between the metal
interconnect layers (M*)
...
How to remove the dissipated heat
is one of the compelling questions
...
Yet, researchers are
demonstrating major progress, and 3D integration might very well be on the horizon
...
One alternative, called 2
...
Vias are
etched to electrically connect both chips after metallization
...
The major limitation of this technique is its lack of precision (best case alignment
+/- 2 µm), which restricts the inter-chip communication to global metal lines
...
2
...
• The manufacturing process of integrated circuits require a large number of steps,
each of which consists of a sequence of basic operations
...
chapter2
...
7
To Probe Further
69
• The optical masks forms the central interface between the intrinsics of the manufacturing process and the design that the user wants to see transferred to the silicon fabric
...
This design rules acts as the contract between the circuit designer and the process
engineer
...
2
...
An excellent overview of the state-of-the-art in CMOS manufacturing can be
found in the “Silicon VLSI Technology” book by J
...
Deal, and P
...
A visual overview of the different steps in the manufacturing process can be
found on the web at [Fullman99]
...
REFERENCES
[Allen99] D
...
, “A 0
...
8 V SOI 550 MHz PowerPC Microprocessor with Copper
Interconnects,” Proceedings IEEE ISSCC Conference, vol
...
438-439, February 1999
...
Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley,
1990
...
Doane, ed
...
[Franzon93] P
...
[Fullman99] Fullman Kinetics, “The Semiconductor Manufacturing Process”, http://www
...
com/semiconductors/semiconductors
...
[Geppert98] L
...
35, No
1, pp
...
[Landman71] B
...
Russo, “On a Pin versus Block Relationship for Partitions of
Logic Graphs,” IEEE Trans
...
C-20, pp
...
[Masaki92] A
...
1992
...
Mead and L
...
[Nagata92] M
...
27, no
...
465–472,
April 1992
...
fm Page 70 Friday, January 18, 2002 8:59 AM
70
THE MANUFACTURING PROCESS
Chapter 2
[Plummer00] J
...
Deal, and P
...
[Steidel83] C
...
551–598, 1983
...
J
...
Banerjee, A
...
C
...
213-220, June 2000
...
fm Page 103 Monday, September 6, 1999 1:44 PM
CHAPTER
4
THE WIRE
Determining and quantifying interconnect parameters
n
Introducing circuit models for interconnect wires
n
Detailed wire models for SPICE
n
Technology scaling and its impact on interconnect
4
...
2
A First Glance
4
...
3
...
3
...
5
The Transmission Line
SPICE Wire Models
4
...
1
4
...
2
4
...
3
...
4
Capacitance
4
...
5
Perspective: A Look into the Future
Inductance
Electrical Wire Models
4
...
1
The Ideal Wire
4
...
2
The Lumped Model
4
...
3
The Lumped RC model
4
...
4
The Distributed rc Line
103
chapter4
...
1 Introduction
Throughout most of the past history of integrated circuits, on-chip interconnect wires were
considered to be second class citizens that had only to be considered in special cases or
when performing high-precision analysis
...
The parasitics effects
introduced by the wires display a scaling behavior that differs from the active devices such
as transistors, and tend to gain in importance as device dimensions are reduced and circuit
speed is increased
...
This situation is
aggravated by the fact that improvements in technology make the production of everlarger die sizes economically feasible, which results in an increase in the average length of
an interconnect wire and in the associated parasitic effects
...
4
...
State-of-the-art processes offer multiple layers of Aluminum, and at least one layer of polysilicon
...
These wires appear in the schematic diagrams of electronic
circuits as simple lines with no apparent impact on the circuit performance
...
All three have multiple
effects on the circuit behavior
...
An increase in propagation delay, or, equivalently, a drop in performance
...
An impact on the energy dissipation and the power distribution
...
An introduction of extra noise sources, which affects the reliability of the circuit
...
This conservative approach is non-constructive and even
unfeasible
...
It is hence totally useless for today’ integrated circuits with their
s
millions of circuit nodes
...
The circuit behavior at a given circuit node is only determined
by a few dominant parameters
...
To achieve the latter, it is important that the designer has a clear insight in the parasitic wiring effects, their relative importance, and their models
...
1
...
fm Page 105 Monday, September 6, 1999 1:44 PM
Section 4
...
1 Schematic and physical views of wiring of bus-network
...
ter (or transmitters) to a set of receivers and is implemented as a link of wire segments of
various lengths and geometries
...
Be aware that the reality may be far more complex
...
2a
...
This is a necessity when the length of the wire becomes significantly larger than its width
...
Analyzing the behavior of this schematic, which only models a small part of the circuit, is slow and cumbersome
...
• Inductive effects can be ignored if the resistance of the wire is substantial — this is
for instance the case for long Aluminum wires with a small cross-section — or if the
rise and fall times of the applied signals are slow
...
2b)
...
Obviously, the latter problems are the easiest to model, analyze, and optimize
...
The
goal of this chapter is to present the reader the basic techniques to estimate the values of
chapter4
...
2 Wire models for the circuit of Figure 4
...
Model (a) considers most of the wire parasitics (with the
exception of interwire resistance and mutual inductance), while model (b) only considers capacitance
...
4
...
3
...
The task is complicated by the fact that the interconnect structure of contemporary integrated circuits is
three-dimensional, as was clearly demonstrated in the process cross-section of FIGURE
(CHAPTER 2)
...
Rather than getting lost
in complex equations and models, a designer typically will use an advanced extraction
tool to get precise values of the interconnect capacitances of a completed layout
...
Yet, some simple first-order models
come in handy to provide a basic understanding of the nature of interconnect capacitance
and its parameters, and of how wire capacitance will evolve with future technologies
...
3
...
fm Page 107 Monday, September 6, 1999 1:44 PM
Section 4
...
Under those circumstances, the total capacitance
of the wire can be approximated as1
ε di
c int = ------ WL
t di
(4
...
SiO is the dielectric material of
2
choice in integrated circuits, although some materials with lower permittivity, and hence
lower capacitance, are coming in use
...
ε is typically expressed as the product of two terms, orε = ε rε0
...
854 × 1012
F/m is the permittivity of free space, andεr the relative permittivity of the insulating
material
...
1 presents the relative permittivity of several dielectrics used in integrated circuits
...
(4
...
Table 4
...
εr
Material
Free space
1
Aerogels
~1
...
9
Glass-epoxy (PC board)
5
Silicon Nitride (Si3N4)
7
...
5
Silicon
11
...
3 Parallel-plate capacitance
model of interconnect wire
...
chapter4
...
while scaling technology, it is desirable to keep the cross-section of the wire W×H) as
(
large as possible — as will become apparent in a later section
...
As a result, we have over the
years witnessed a steady reduction in theW/H-ratio, such that it has even dropped below
unity in advanced processes
...
Under those circumstances, the parallel-plate model assumed above becomes inaccurate
...
This effect is illustrated in Figure 4
...
Presenting an exact model for this difficult geome-
cfringe
(a) Fringing fields
cpp
w
H
+
Figure 4
...
The model decomposes the
capacitance into two contributions: a
parallel-plate capacitance, and a fringing
capacitance, modeled by a cylindrical wire
with a diameter equal to the thickness of
the wire
...
try is hard
...
4b): a parallel-plate
capacitance determined by the orthogonal field between a wire of width and the ground
w
plane, in parallel with the fringing capacitance modeled by a cylindrical wire with a
dimension equal to the interconnect thicknessH
...
wεdi
2πε di
c wire = c pp + c fringe = ---------- + -------------------------t di
log ( t di ⁄ H )
(4
...
Numerous more accurate models (e
...
[Vdmeijs84]) have been developed over time, but
these tend to be substantially more complex, and defeat our goal of developing a conceptual understanding
...
5 plots the
value of the wiring capacitance as a function of W/H)
...
For ( /H) smaller than 1
...
fm Page 109 Monday, September 6, 1999 1:44 PM
Section 4
...
The fringing capacitance can
increase the overall capacitance by a factor of more than 10 for small line widths
...
In other words, the
capacitance is no longer a function of the width
...
5 Capacitance of interconnect wire as a function of W/H), including fringing-field effects (from
(
[Schaper83])
...
So far, we have restricted our analysis to the case of a single rectangular conductor
placed over a ground plane
...
Todays processes offer many more layers of interconnect, which are
packed quite densily in addition
...
This is illustrated in Figure 4
...
Each wire is not only coupled
to the grounded substrate, but also to the neighboring wires on the same layer and on adjacent layers
...
The main difference is that not all its capacitive components do terminate at the
grounded substrate, but that a large number of them connect to other wires, which have
dynamically varying voltage levels
...
In summary, interwire capacitances become a dominant factor in multi-layer interconnect structures
...
The increasing contribution of the
interwire capacitance to the total capacitance with decreasing feature sizes is best illustrated by Figure 4
...
In this graph,which plots the capacitive components of a set of parallel wires routed above a ground plane, it is assumed that dielectric and wire thickness are
chapter4
...
6 Capacitive coupling
between wires in interconnect
hierarchy
...
WhenW becomes smaller than 1
...
Figure 4
...
It consists of a
capacitance to ground and an inter-wire
capacitance (from [Schaper83])
...
25µm CMOS process are given in
Table 4
...
The process supports 1 layer of polysylicon and 5 layers of Aluminum
...
When placing
the wires over the thick field oxide that is used to isolate different transistors, use the “Field”
column in the table, while wires routed over the active area see a higher capacitance as seen in
the “Active” column
...
To obtain more
chapter4
...
3
Interconnect Parameters — Capacitance, Resistance, and Inductance
111
accurate results for actual structures, complex 3-dimensional models should be used that take
the environment of the wire into account
...
2 Wire area and fringe capacitance values for typical 0
...
The table
rows represent the top plate of the capacitor, the columns the bottom plate
...
Field
Active
Poly
41
Al1
Al2
Al3
Al4
57
Poly
88
Al1
30
40
47
54
Al2
13
15
17
36
25
27
29
45
Al3
8
...
4
10
15
41
18
19
20
27
49
Al4
6
...
8
7
8
...
2
5
...
4
6
...
1
14
38
12
12
12
14
19
27
52
54
Table 4
...
Observe that these
numbers include both the parallel plate and fringing components
...
For instance, a ground plane placed on a neighboring
layer terminates a large fraction of the fringing field, and effectively reduces the interwire
capacitance
...
On the other hand, the thick Al5 wires display the highest interwire capacitance
...
The supply rails are an example of the latter
...
3 Interwire capacitance per unit wire length for different interconnect layers of typical 0
...
The capacitances are expressed in aF/µm, and are for minimally-spaced wires
...
fm Page 112 Monday, September 6, 1999 1:44 PM
112
THE WIRE
Chapter 4
Example 4
...
The length of those
wires can be substantial
...
Consider an aluminum wire of 10
cm long and 1 µm wide, routed on the first Aluminum layer
...
2
...
1 × 106 µm2) × 30 aF/µm2= 3 pF
Fringing capacitance: 2 × (0
...
Suppose now that a second wire is routed alongside the first one, separated by only the
minimum allowed distance
...
3, we can determine that this wire will couple to
the first with a capacitance equal to
Cinter =(0
...
5 pF
which is almost as large as the total capacitance to ground!
A similar exercise shows that moving the wire to Al4 would reduce the capacitance to
ground to 3
...
65 pF area and 2
...
5 pF
...
3
...
The resistance of a rectangular conductor in the style of Figure 4
...
3)
where the constant ρ is the resistivity of the material (in Ω-m)
...
4
...
Unfortunately, it has a
large resistivity compared to materials such as Copper
...
Table 4
...
Material
ρ (Ω -m)
Silver (Ag)
1
...
fm Page 113 Monday, September 6, 1999 1:44 PM
Section 4
...
4
113
Resistivity of commonly-used conductors (at 20 C)
...
7 ×10 − 8
Gold (Au)
2
...
7 ×10 − 8
Tungsten (W)
5
...
(4
...
4)
ρ
Rq = --H
(4
...
This expresses that the resistance of a square conductor is independent of its absolute size, as is apparent from Eq
...
4)
...
Interconnect Resistance Design Data
Typical values of the sheet resistance of various interconnect materials are given in Table 4
...
Table 4
...
25µm CMOS process
...
05 – 0
...
Polysilicon should only be used for local interconnect
...
RC
chapter4
...
silicide is
A
a compound material formed using silicon and a refractory metal
...
Examples of silicides are WSi2, TiSi2, PtSi2, and TaSi
...
The silicides
are most often used in a configuration called apolycide, which is a simple layered combination of polysilicon and a silicide
...
A MOSFET fabricated with a polycide gate is shown in Figure 4
...
The advantage of the silicided gate is a reduced gate resistance
...
Silicide
Polysilicon
SiO2
n+
n+
p
Figure 4
...
A polycide-gate
Transitions between routing layers add extra resistance to a wire, called the
contact
resistance
...
It is possible to reduce the contact
resistance by making the contact holes larger
...
This effect, calledcurrent crowding, puts a
practical upper limit on the size of the contact
...
25 µm process: 5-20 Ω for metal or polysilicon to
n+, p+, and metal to polysilicon; 1-5Ω for via’ (metal-to-metal contacts)
...
2 Resistance of a Metal Wire
Consider again the aluminum wire of Example 4
...
Assuming a sheet resistance for Al1 of 0
...
075 Ω/q × (0
...
5 kΩ
Implementing the wire in polysilicon with a sheet resistance of 175 q raises the overall
Ω/
resistance to 17
...
Silicided polysilicon with a sheet resistance of 4 Ω/q offers a better alternative, but still translates into a wire with a 400 k resisΩ
tance
...
This is definitely the case for most semiconductor circuits
...
High-frequency currents tend to flow
primarily on the surface of a conductor with the current density falling off exponentially
chapter4
...
3
Interconnect Parameters — Capacitance, Resistance, and Inductance
115
with depth into the conductor
...
6)
with f the frequency of the signal andµ the permeability of the surrounding dielectric (typically equal to the permeability of free space, orµ = 4π × 10-7 H/m)
...
6 µm
...
9 for a rectangular wire
...
9 The skin-effect reduces the flow of the current to the
surface of the wire
...
7)
The increased resistance at higher frequencies may cause an extra attenuation — and
hence distortion — of the signal being transmitted over the wire
...
Belowfs the whole wire is conducting current, and the resistance is equal to (constant) low-frequency resistance of the wire
...
(4
...
8)
Example 4
...
7 10-8 Ω-m, embedded in a SiO2 dielectric
×
with a permeability of 4 × 10-7 H/m
...
(4
...
2 µm for the effect to be noticable at 1 GHz
...
10, which plots the increase in resistance
due to skin effects for different width Aluminum conductors
...
fm Page 116 Monday, September 6, 1999 1:44 PM
116
THE WIRE
Chapter 4
can be observed at 1 GHz for a 20 µm wire, while the increase for a 1µm wire is less than
1%
...
10 Skin-effect induced
increase in resistance as a function
of frequency and wire width
...
7 µm
[Sylvester97]
...
Since clocks tend to
carry the highest-frequency signals on a chip and also are fairly wide to limit resistance,
the skin effect is likely to have its first impact on these lines
...
Another major design concern is that the adoption of better
conductors such as Copper may move the on-set of skin-effects to lower frequencies
...
3
...
This was definitely the case in
the first decades of integrated digital circuit design
...
Consequences of on-chip inductance
include ringing and overshoot effects, reflections of signals due to impedance mismatch,
inductive coupling between lines, and switching noise due to
Ldi/dt voltage drops
...
9)
chapter4
...
3
Interconnect Parameters — Capacitance, Resistance, and Inductance
117
It is possible to compute the inductance a wire directly from its geometry and its
environment
...
10)
with ε and µ respectivily the permittivity and permeability of the surrounding dielectric
...
This is most often not the case
...
(4
...
Some other interesting relations, obtained from Maxwell’ laws, can be pointed out
...
11)
c0 equals the speed of light (30 cm/nsec) in a vacuum
...
6
...
Table 4
...
The relative permeability µr of most dielectrics is approximately equal to 1
...
9
15
PC board (epoxy glass)
5
...
5
10
Dielectric
Example 4
...
25 micron CMOS technology and routed on top
of the field oxide
...
2, we can derive the capacitance of the wire per unit length:
c = (W×30 + 2×40) aF/µm
From Eq
...
10), we can derive the inductance per unit length of the wire, assuming
SiO2 as the dielectric and assuming a uniform dielectric (make sure to use the correct units!)
l = (3
...
854 × 10
− 12
)×(4 π 10
−7
)/ C
For wire widths of 0
...
fm Page 118 Monday, September 6, 1999 1:44 PM
118
THE WIRE
Chapter 4
W = 0
...
47 pH/µm
W = 1 µm: c = 110 aF/µm; l = 0
...
11 pH/µm
Assuming a sheet resistance of 0
...
075/W Ω/µm
It is interesting to observe that the inductive part of the wire impedance becomes equal
in value to the resistive component at a frequency of 27
...
For wires with
a smaller capacitance and resistance (such as the thicker wires located at the upper interconnect layers), this frequency can become as low as 500 MHz, especially when better interconnect materials such as Copper are being used
...
4
...
These
parasitic elements have an impact on the electrical behavior of the circuit and influence its
delay, power dissipation, and reliability
...
These models vary from very simple to very complex depending upon
the effects that are being studied and the required accuracy
...
4
...
1
The Ideal Wire
In schematics, wires occur as simple lines with no attached parameters or parasatics
...
A voltage change at
one end of the wire propagates immediately to its other ends, even if those are some distance away
...
While this ideal-wire model is simplistic, it has its value, especially in the early phases of
the design process when the designer wants to concentrate on the properties and the
behavior of the transistors that are being connected
...
Taking these into account would just make the analysis unnecessarily complex
...
chapter4
...
4
4
...
2
Electrical Wire Models
119
The Lumped Model
The circuit parasitics of a wire are distributed along its length and are not lumped into a
single position
...
The
advantage of this approach is that the effects of the parasitic then can be described by an
ordinary differential equation
...
As long as the resistive component of the wire is small and the switching frequencies are in the low to medium range, it is meaningful to consider only the capacitive component of the wire, and to lump the distributed capacitance into a single capacitor as
shown in Figure 4
...
Observe that in this model the wire still represents an equipotential
region, and that the wire itself does not introduce any delay
...
This capacitive lumped model is simple, yet effective, and is the model of choice for the analysis of
most interconnect wires in digital integrated circuits
...
11 Distributed versus lumped capacitance model of wire
...
The driver is modeled as a voltage source and a source resistance driver
...
5 Lumped capacitance model of wire
For the circuit of Figure 4
...
In Example 4
...
The operation of this simple RC network is described by the following ordinary differential
equation:
C lumped
dV out V out – V in
+ ----------------------- = 0
R driver
dt
When applying a step input (with Vin going from 0 to V), the transient response of this circuit is
known to be an exponential function, and is given by the following expression (where =
τ
Rdriver×Clumped , the time constant of the network):
Vout(t) = (1 − e−t/τ) V
The time to reach the 50% point is easily computed ast = ln(2)τ = 0
...
Similarly, it takes t
= ln(9)τ = 2
...
It is worth memorizing these numbers, as they are extensively used in the rest of the text
...
fm Page 120 Monday, September 6, 1999 1:44 PM
120
THE WIRE
Chapter 4
t50% = 0
...
2 × 10 KΩ × 11 pF = 242 nsec
These numbers are not even acceptable for the lowest performance digital circuits
...
While the lumped capacitor model is the most popular, sometimes it is also useful to
present lumped models of a wire with respect to either resistance and inductance
...
Both the resistance and
inductance of the supply wires can be interpreted as parasitic noise sources that introduce
voltage drops and bounces on the supply rails
...
4
...
The equipotential assumption, presented in the lumped-capacitor model, is no longer adequate, and a
resistive-capacitive model has to be adopted
...
This simple
model, called the lumped RC model is pessimistic and inaccurate for long interconnect
wires, which are more adequately represented by adistributed rc-model
...
The behavior of the distributed rc-line can be adequately modeled by a simpleRC network
...
Having a means to analyze such
a network effectively and to predict its first-order response would add a great asset
to the designers tool box
...
5, we analyzed a single resistor-single capacitor network
...
Unfortunately, deriving the correct waveforms for a network with a larger number of capacitors
and resistors rapidly becomes hopelessly complex: describing its behavior requires a set of
ordinary differential equations, and the network now contains many time-constants (or
poles and zeros)
...
Consider the resistor-capacitor network of Figure 4
...
This circuit is called anRCtree and has the following properties:
• the network has a single input node (calleds in Figure 4
...
fm Page 121 Monday, September 6, 1999 1:44 PM
Section 4
...
12
Tree-structured RC network
...
The total resistance along
this path is called the path resistance Rii
...
12 equals
R44 = R1 + R3 + R4
The definition of the path resistance can be extended to address theshared path
resistance Rik, which represents the resistance shared among the paths from the root node
s
to nodes k and i:
R ik =
∑ R ⇒ ( R ∈ [ path ( s → i ) ∩ path ( s → k]) )
j
j
(4
...
12, Ri4 = R1 + R3 while Ri2 = R1
...
The Elmore delay at node i is then
given by the following expression:
N
τ Di =
∑C R
k
ik
(4
...
The designer should be aware that this time-constant
represents a simple approximation of the actual delay between source node and node Yet
i
...
It
offers the designer a powerful mechanism for providing a quick estimate of the delay of a
complex network
...
6 RC delay of a tree-structured network
Using Eq
...
13), we can compute the Elmore delay for nodei in the network of Figure 4
...
τDi = R1C1 + R1C2 + (R1 + R3)C3 + (R1 + R3)C4 + (R1 + R3 + Ri)Ci
chapter4
...
13
...
The Elmore delay of this chain network can be derived with the aid of Eq
...
13):
Vin
R1
1
R2
Ri–1
2
Ri
i–1
RN
i
N
VN
C1
Figure 4
...
N
τ DN =
i
∑C ∑R
i
i=1
j=1
N
j
=
∑C R
i
ii
(4
...
As an example,
consider node 2 in the RC chain of Figure 4
...
Its time-constant consists of two components contributed by nodes 1 and 2
...
The equivalent time constant at node2 equals C1R1 + C2(R1 + R2)
...
τDi = C1R1 + C2(R1 + R2) + … +Ci(R1 + R2 + … + Ri)
Example 4
...
13 can be used as an approximation of a resistive-capacitive wire
...
The resistance and capacitance of each segment are hence given byrL/N and cL/N, respectively
...
15)
with R (= rL) and C (= cL) the total lumped resistance and capacitance of the wire
...
Eq
...
15) then simplifies
to the following expression:
2
RC
rcL
τ DN = ------- = ----------2
2
(4
...
(4
...
chapter4
...
4
Electrical Wire Models
123
• The delay of the distributed rc-line is one half of the delay that would have been
predicted by the lumped RC model
...
(4
...
This confirms the observation made earlier
that the lumped model presents a pessimistic view on the delay of resistive wire
...
The Elmore expression determines the value of only the dominant one, and presents thus a
first-order approximation
...
Besides making it
possible to analyze wires, the formula can also be used to approximate the propagation
delay of complex transistor networks
...
The evaluation of the propagation delay is then
reduced to the analysis of the resultingRC network
...
These bounds have formed the base for most computer-aided timing analyzers at the switch and functional level [Horowitz83]
...
4
...
4
The Distributed rc Line
In the previous paragraphs, we have shown that the lumpedRC model is a pessimistic
model for a resistive-capacitive wire, and that a distributedrc model (Figure 4
...
As before, L represents the total length of the wire, whiler and c stand
for the resistance and capacitance per unit length
...
14b
...
17)
The correct behavior of the distributed rc line is then obtained by reducing∆L asymptotically to 0
...
(4
...
18)
where V is the voltage at a particular point in the wire, andx is the distance between this
point and the signal source
...
(4
...
These equations are difficult to use for ordinary circuit analysis
...
fm Page 124 Monday, September 6, 1999 1:44 PM
124
THE WIRE
r∆L
Vin
c∆L
Vi-1
r∆L
r∆L
Vi
c∆L
c∆L
r∆L
Vi+1
r∆L
c∆L
Chapter 4
Vout
c∆L
L
(a) Distributed model
(r,c,L)
Vin
Vout
(b) Schematic symbol for distributedRC line
Figure 4
...
which can be easily used in computer-aided analysis
...
RC
V out ( t ) = 2erfc ( ------- )
4t
= 1
...
366e
t
–2
...
19)
+ 0
...
4641 -------RC
t » RC
Figure 4
...
Observe how the step waveform “diffuses” from the start to the end of the wire, and the waveform rapidly degrades, resulting
in a considerable delay for long wires
...
fm Page 125 Monday, September 6, 1999 1:44 PM
Section 4
...
It hence will receive considerable attention in later chapters
...
5
x= L/10
2
voltage (V)
x = L/4
1
...
5
0
0
Figure 4
...
5
1
1
...
5
time (nsec)
3
3
...
5
5
Simulated step response of resistive-capacitive wire as a function of time and place
...
7
...
69 RC
...
38
RC, with R
and C the total resistance and capacitance of the wire
...
(4
...
Table 4
...
Voltage range
Lumped RC network
Distributed RC network
0 → 50% (tp)
0
...
38 RC
0 → 63% (τ)
RC
0
...
2 RC
0
...
3 RC
1
...
8 RC delay of Aluminum Wire
Let us consider again the 10 cm long, 1 µm wide Al1 wire of Example 4
...
In Example 4
...
075 Ω /µm;
Using the entry of Table 4
...
38 RC = 0
...
075 Ω /µm) × ( 110 aF/µm) × (105 µm)2 = 31
...
The values of the capacitances are obtained fromTable 4
...
0375 Ω /µm for Poly and Al5:
chapter4
...
38 × ( 150 Ω /µm) × ( 88 + 2 × 54 aF/µm) × (105 µm)2 = 112 µsec!
Al5: tp = 0
...
0375 Ω/µm) × ( 5
...
2 nsec
Obviously, the choice of the interconnect material and layer has a dramatic impact on the delay of
the wire
...
A simple rule of thumb proves to be very useful here
...
This translates into Eq
...
20), which determines the critical lengthL of the interconnect
wire where RC delays become dominant
...
38rc
(4
...
• rc delays should only be considered when the rise (fall) time at the line input is
smaller than RC, the rise (fall) time of the line
...
21)
with R and C the total resistance and capacitance of the wire
...
Example 4
...
16
...
The total propagation delay of the network can be approxi2
mated by the following expression, obtained by applying the Elmore formula:
Rw Cw
2
τ D = R s C w + ------------- = R s C w + 0
...
13 and apply the Elmore equation on the
resulting network
...
fm Page 127 Monday, September 6, 1999 1:44 PM
Section 4
...
16 rc-line of length L driven by source with
resistance equal to Rs
...
69 R s C w + 0
...
The delay introduced by the wire resistance becomes dominant
when (RwCw)/2 ≥ RsCw, or L ≥ 2Rs/r
...
075 Ω/µm)
...
67
cm
...
4
...
This is more precisely the case when the rise and fall times
of the signal become comparable to the time of flight of the signal waveform across the
line as determined by the speed of light
...
In this section, we first analyze the transmission line model
...
Transmission Line Model
Similar to the resistance and capacitance of an interconnect line, the inductance is distributed over the wire
...
The transmission
line has the prime property that a signal propagates over the interconnection medium as a
wave
...
(4
...
In the wave
mode, a signal propagates by alternatively transferring energy from the electric to the
magnetic fields, or equivalently from the capacitive to the inductive modes
...
17 at timet
...
fm Page 128 Monday, September 6, 1999 1:44 PM
128
THE WIRE
Vin
r
g
Figure 4
...
∂v
∂i
= – ri – l
∂x
∂t
(4
...
(4
...
2
2
∂v
= rc ∂v + lc ∂ v
∂t
∂ x2
∂t2
(4
...
To understand the behavior of the transmission line, we will first assume that the
resistance of the line is small
...
This model is applicable for wires at the
printed-circuit board level
...
On the other hand,
resistance plays an important role in integrated circuits, and a more complex model, called
the lossy transmission line should be considered
...
The Lossless Transmission Line
For the lossless line, Eq
...
23) simplifies to theideal wave equation:
2
2
2
∂v
∂v
1∂v
= lc 2 = ----- 2
∂x2
∂t
ν2 ∂t
(4
...
(4
...
c0
1
1
ν = -------- = ---------- = -------------lc
εµ
εrµ r
(4
...
The propagation
delay per unit wire length(tp) of a transmission line is the inverse of the speed:
chapter4
...
4
Electrical Wire Models
129
tp =
lc
(4
...
Suppose
that a voltage step V has been applied at the input and has propagated to pointx of the line
(Figure 4
...
All currents are equal to 0 at the right side ofx, while the voltage over the
line equals V at the left side
...
This requires the following current:
I =
dQ
= c dx V = cνV =
dt
dt
c
-V
l
(4
...
18 Propagation of
voltage step along a lossless
transmission line
...
This means that the signal sees
the remainder of the line as areal impedance,
V
Z 0 = --- =
I
l
εµ
1
- = ---------- = -----
...
28)
This impedance, called the characteristic impedance of the line, is a function of the
dielectric medium and the geometry of the conducting wire and isolator (Eq
...
28)), and
is independent of the length of the wire and the frequency
...
Typical values of the characteristic impedance of wires in semiconductor circuits
range from 10 to 200 Ω
...
10
Propagation Speeds of Signal Waveforms
The information of Table 4
...
5 nsec for a signal wave to propagate from
source-to-destination on a 20 cm wire deposited on an epoxy printed-circuit board
...
67 nsec for the
signal to reach the end of a 10 cm wire
...
The electro-magnetic fields in complex interconnect structures tend to be
irregular, and are strongly influenced by issues such as the current return path
...
fm Page 130 Monday, September 6, 1999 1:44 PM
130
THE WIRE
Chapter 4
analytical solutions are typically available
...
For some simplified structures,
approximative expressions have been derived
...
(4
...
(4
...
µr
2t + W
-Z 0 ( triplate) ≈ 94Ω --- ln ---------------
ε r H + W
(4
...
475ε r + 0
...
536W + 0
...
30)
and
Termination
The behavior of the transmission line is strongly influenced by the termination of the line
...
This is expressed by the reflection coefficient ρ that determines the relationship
between the voltages and currents of the incident and reflected waveforms
...
31)
where R is the value of the termination resistance
...
V = V inc ( 1 + ρ )
I = I inc ( 1 – ρ )
(4
...
19
...
The termination appears as an infinite extension of the line, and no waveform is reflected
...
In case (b), the line termination is
an open circuit (R = ∞), and ρ = 1
...
(4
...
Finally, in case (c) where the line termination
is a short circuit, R = 0, and ρ = –1
...
The transient behavior of a complete transmission line can now be examined
...
20
...
An incoming wave is completely reflected without phase reversal
...
21: RS = 5 Z0, RS = Z0, and RS = 1/5 Z0
...
fm Page 131 Monday, September 6, 1999 1:44 PM
Section 4
...
19
Behavior of various transmission line terminations
...
20 Transmission line
with terminating impedances
...
Large source resistance—RS = 5 Z0 (Figure 4
...
The amount injected is determined by the resistive divider formed by the source
resistance and the characteristic impedanceZ0
...
83 V
(4
...
67 V)
...
Approximately the same happens when
the wave reaches the source node again
...
5Z 0 – Z 0
2
ρ S = -------------------- = -5Z 0 + Z0
3
(4
...
The overall rise time is, however, many timesL/ν
...
fm Page 132 Monday, September 6, 1999 1:44 PM
132
THE WIRE
Chapter 4
5
4
Vdest
2
Vsource
V
3
1
(a) RS = 5Z0, RL = ×
0
4
V
3
2
1
(b) RS = Z0, RL = ×
0
8
V
6
4
2
(c) RS = Z0/5, RL = ×
00
Figure 4
...
15
When multiple reflections are present, as in the above case, keeping track of waves
on the line and total voltage levels rapidly becomes cumbersome
...
22)
...
The line voltage at a termination point equals the sum of the previous voltage, the incident, and reflected waves
...
Small source resistance—RS = Z0/5 (Figure 4
...
Its value is doubled at the destination end, which causes a severe overshoot
...
The signal bounces back and forth and exhibits severe ringing
...
3
...
21b)
Half of the input signal is injected at the source
...
It is obvious that this is
chapter4
...
4
Electrical Wire Models
133
Vsource
Vdest
0
...
8333
1
...
8333
2
...
5556
2
...
5556
+ 0
...
1482 V
t
3
...
3704
3
...
2469
4
...
2469
...
22 Lattice diagram forRS =
5 Z0 and RL = ∞
...
21a)
...
Matching the line impedance at the source end is called
series termination
...
In real conditions the signals are substantially smoother, as demonstrated
in the simulated response of Figure 4
...
10
Vdest
V (V)
8
Vsource
Vin
6
4
2
0
0
100
200
300
400
t (psec)
500
Figure 4
...
Problem 4
...
Also try the reverse picture— assume that the series resistance of the source equals zero,
and consider different load impedances
...
Matching the load impedance to the characteristic impedance
of the line once again results in the fastest response
...
chapter4
...
Example 4
...
One might wonder how this
influences the transmission line behavior and when the load capacitance should be taken into
account
...
From the load’ point of the view, the line behaves as a
s
resistance with value Z0
...
This is illustrated in Figure 4
...
The response shows how the output rises to its final value
with a time-constant of 100 psec (= 50 Ω × 2 pF) after a delay equal to the time-of-flight of
the line
...
After 2flight , an unexpected
t
voltage dip occurs at the source node that can be explained as follows
...
This reflected wave also approaches its final
value asymptotically
...
5 V rather than the expected 2
...
This forces the transmission line temporarily to 0 V, as shown in the simulation
...
5
...
0
V
3
...
0
1
...
0
0
0
...
2
0
...
4
0
...
24 Capacitively terminated
transmission line: RS = 50 Ω, RL = ∞, CL
= 2 pF, Z0 = 50 Ω, tflight = 50 psec
...
69Z0 CL = 69 psec)
...
In general, we can say that the capacitive load should only be considered in the analysis when its value is comparable to or larger than the total capacitance of
the transmission line [Bakoglu90]
...
fm Page 135 Monday, September 6, 1999 1:44 PM
Section 4
...
The lossy transmission-line model should be applied
instead
...
We
therefore only discuss the effects of resistive loss on the transmission line behavior in a
qualitative fashion
...
This is demonstrated in Figure 4
...
The step input still propagates as a wave through the line
...
35)
The arrival of the wave is followed by a diffusive relaxation to the steady-state value
at point x
...
In fact, the resistive effect becomes dominant, and the line
behaves as a distributed RC line when R ( = rL, the total resistance of the line) >> 2Z0
...
At that point, the
line is more appropriately modeled as a distributedrc line
...
25
Dest
x
t
t
Step response of lossy transmission line
...
For instance, branches on wires, often
called transmission line taps, cause extra reflections and can affect both signal shape and
delay
...
For a more extensive discussion of
these effects, we would like to refer the reader to [Bakoglu90] and [Dally98]
...
fm Page 136 Monday, September 6, 1999 1:44 PM
136
THE WIRE
Chapter 4
Design Rules of Thumb
Once again, we have to ask ourselves the question when it is appropriate to consider
transmission line effects
...
(
This leads to the following rule of thumb, which determines when transmission line
effects should be considered:
L
t r ( t f ) < 2
...
5 -ν
(4
...
At the board level, where wires can reach a length of
up to 50 cm, we should account for the delay of the transmission line whenr < 8 nsec
...
Ignoring the inductive component of the propagation delay can easily result in overly optimistic
delay predictions
...
37)
If this is not the case, the distributed RC model is more appropriate
...
5 lc
(4
...
12
(4
...
Using the data from Example 4
...
(4
...
1 µm: c = 92 aF/µm; Z0 = 74 Ω
W = 1
...
fm Page 137 Monday, September 6, 1999 1:44 PM
Section 4
...
(4
...
075Ω ⁄ µm
From Eq
...
36), we find a corresponding maximum rise (or fall) time of the input signal
equal to
trmax = 2
...
For these wires, a lumped capacitance
model is more appropriate
...
For a
10 µm wide wire, we find a maximum length of 11
...
Assume now a Copper wire, implemented on level 5, with a characteristic impedance
of 200 Ω and a resistance of 0
...
The resulting maximum wire length equals 40 mm
...
Be aware however that the values for Z0, derived in this example, are only approximations
...
Example 4
...
5 SPICE Wire Models
In previous sections, we have discussed the
4
...
1
Distributed rc Lines in SPICE
Because of the importance of the distributedrc-line in todays design, most circuit simulators have built-in distributed rc-models of high accuracy
...
This model
approximates the rc-line as a network of lumped RC segments with internally generated
nodes
...
Example 4
...
N1 and
N2 represent the terminal nodes of the line, while N3 is the node the capacitances are connected to
...
U1 N1=1 N2=2 N3=0 URCMOD L=50m N=6
...
fm Page 138 Monday, September 6, 1999 1:44 PM
138
THE WIRE
Chapter 4
If your simulator does not support a distributedrc-model, or if the computational
complexity of these models slows down your simulation too much, you can construct a
simple yet accurate model yourself by approximating the distributed by a lumped RC
rc
network with a limited number of elements
...
26 shows some of these approximations ordered along increasing precision and complexity
...
For instance, the error of theπ3 model is less than
3%, which is generally sufficient
...
5
...
26
R/4
C/2
π2
R/3
R/2
R/3
R/3
C/3
C/3
R/6
C/3
T3
Simulation models for distributedRC line
...
The line characteristics are defined by
the characteristic impedance Z0, while the length of the line can be defined in either of two
forms
...
Alternatively, a frequency may be given together withNL, the
F
dimensionless, normalized electrical length of the transmission line, which is measured
with respect to the wavelength in the line at the frequencyF
...
NL = F ⋅ TD
(4
...
fm Page 139 Monday, September 6, 1999 1:44 PM
Section 4
...
When necessary, loss can be
added by breaking up a long transmission line into shorter sections and adding a small
series resistance in each section to model the transmission line loss
...
First of all, the accuracy is still limited
...
For small transmission lines, this time step might be much
smaller than what is needed for transistor analysis
...
6 Perspective: A Look into the Future
Similar to the approach we followed for the MOS transistor, it is worthwhile to explore
how the wire parameters will evolve with further scaling of the technology
...
A straightforward approach is to scale all dimensions of the wire by the same factor
S as the transistors (ideal scaling)
...
It can be surmised that the length oflocal interconnections —
wires that connect closely grouped transistors — scales in the same way as these transistors
...
Examples of such wires are clock signals, and data and instruction buses
...
27 contains a
histogram showing the distribution of the wire lengths in an actual microprocessor design,
containing approximately 90,000 gate) [Davis98]
...
Figure 4
...
The average length of these long wires is proportional to the die size (or complexity)
of the circuit
...
In fact, the size of the
typical die (which is the square root of the die area) is increasing by 6% per year, doubling
about every decade
...
They are projected to reach 4 cm on the side by 2010!
chapter4
...
In our subsequent analysis, we will therefore
consider three models: local wires ( L = S > 1), constant length wires (SL = 1), and global
S
wires (SL = SC < 1)
...
This leads to the scaling behavior illustrated in Table
4
...
Be aware that this is only a first-order analysis, intended to look at overall trends
...
Table 4
...
A constant delay is predicted for local wires, while the delay of the global wires goes up with 50% per year (for
S
= 1
...
94)
...
This explains why wire delays are starting to play a predominant role in
todays digital integrated circuit design
...
This explains why other interconnect scaling techniques are attractive
...
The “constant resistance” model of
Table 4
...
While this approach
seemingly has a positive impact on the performance, it causes the fringing and interwire
capacitance components to come to the foreground
...
Table 4
...
fm Page 141 Monday, September 6, 1999 1:44 PM
Section 4
...
9
141
“Constant Resistance” Scaling of Wire Properties
Parameter
Relation
Local Wire
Constant Lenght
Global Wire
R
L/WH
1
S
S/SC
CR
L2/Ht
εc/S
εcS
εcS/SC2
This scaling scenario offers a slightly more optimistic perspective, assuming of
course that εc < S
...
To keep these delays from becoming excessive, interconnect technology has to be drastically improved
...
The other option is to differentiate between local and global wires
...
To address these
conflicting demands, modern interconnect topologies combine a dense and thin wiring
grid at the lower metal layers with fat, widely spaced wires at the higher levels, as is illustrated in Figure 4
...
Even with these advances, it is obvious that interconnect will play a
dominant role in bothg high-performance and low-energy circuits for years to come
...
28 Interconnect hierarchy of 0
...
substrate
4
...
The main goal is to identify
the dominant parameters that set the values of the wire parasitics (being capacitance, resistance, and inductance), and to present adequate wire models that will aid us in the further
analysis and optimization of complex digital circuits
...
fm Page 142 Monday, September 6, 1999 1:44 PM
142
THE WIRE
Chapter 4
4
...
A number of textbooks and reprint volumes have been published
...
REFERENCES
[Antognetti88] P
...
Masobrio (eds
...
[Banzhaf92] W
...
, Prentice Hall, 1992
...
Chen, CMOS Devices and Technology for VLSI , Prentice Hall, 1990
...
Getreu, “Modeling the Bipolar Transistor,” Tektronix Inc
...
[Gray69] P
...
Searle, Electronic Principles, John Wiley and Sons, 1969
...
Gray and R
...
, John
Wiley and Sons, 1993
...
Haznedar, Digital Microelectronics, Benjamin/Cummings, 1991
...
Hodges and H
...
,
McGraw-Hill, 1988
...
Howe and S
...
[Hu92] C
...
27, no
...
241–246, March 1992
...
Hu, “Future CMOS Scaling and Reliability,” IEEE Proceedings, vol
...
5, May
1993
...
Jensen et al
...
1–61, August 1991
...
Ko, “Approaches to Scaling,” in VLSI Electronics: Microstructure Science, vol
...
1–37, Academic Press, 1989
...
Muller and T
...
, John
Wiley and Sons, 1986
...
Nagel, “SPICE2: a Computer Program to Simulate Semiconductor Circuits,” Memo
ERL-M520, Dept
...
and Computer Science, University of California at Berkeley, 1975
...
Sedra and K
...
, Holt, Rinehart and Winston,
1987
...
Sheu, D
...
Ko, and M
...
SC-22, no
...
558–565, August 1987
...
Sze, Physics of Semiconductor Devices, 2nd ed
...
[Thorpe92] T
...
[Toh88] K
...
Koh, and R
...
23
...
4, pp 950–957, August 1988
...
fm Page 143 Monday, September 6, 1999 1:44 PM
Section 4
...
Tsividis, Operation and Modeling of the MOS Transistor, McGraw-Hill, 1987
...
Yamaguchi et al
...
Electron
...
35, no 8, pp
...
[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
9 Exercises and Design Problems
chapter5
...
1
Introduction
5
...
2
5
...
4
...
3
Evaluating the Robustness of the CMOS
Inverter: The Static Behavior
5
...
5
...
3
...
5
...
5
...
3
...
4
Switching Threshold
5
...
2
Robustness Revisited
5
...
4
Analyzing Power Consumption Using
SPICE
Performance of CMOS Inverter: The Dynamic
Behavior
5
...
1
144
Computing the Capacitances
5
...
fm Page 145 Monday, September 6, 1999 11:41 AM
Section 5
...
1 Introduction
The inverter is truly the nucleus of all digital designs
...
The electrical behavior of these complex circuits can be almost completely derived by extrapolating the results obtained for
inverters
...
In this chapter, we focus on one single incarnation of the inverter gate, being the
static CMOS inverter — or the CMOS inverter, in short
...
We analyze the gate with respect
to the different design metrics that were outlined in Chapter 1:
• cost, expressed by the complexity and area
• integrity and robustness, expressed by the static (or steady-state) behavior
• performance, determined by the dynamic (or transient) response
• energy efficiency, set by the energy and power consumption
From this analysis arises a model of the gate that will help us to identify the parameters of the gate and to choose their values so that the resulting design meets desired specifications
...
While this Chapter focuses uniquely on the CMOS inverter, we will see in the following Chapter that the same methodology also applies to other gate topologies
...
2 The Static CMOS Inverter — An Intuitive Perspective
Figure 5
...
Its operation is readily
understood with the aid of the simple switch model of the MOS transistor, introduced in
Chapter 3 (Figure 3
...
This leads to the
|
VDD
Vin
Vout
CL
Figure 5
...
VDD stands for the
supply voltage
...
fm Page 146 Monday, September 6, 1999 11:41 AM
146
THE CMOS INVERTER
Chapter 5
following interpretation of the inverter
...
This yields the equivalent circuit of Figure 5
...
A
direct path exists between Vout and the ground node, resulting in a steady-state value of 0
V
...
The equivalent circuit of Figure 5
...
The gate clearly functions as an
inverter
...
2
inverter
...
This results in high noise margins
...
Gates with this property are called
ratioless
...
• In steady state, there always exists a path with finite resistance between the output
and either VDD or GND
...
Typical values of the output resistance are in k range
...
Since the
input node of the inverter only connects to transistor gates, the steady-state input
current is nearly zero
...
So, although fan-out does not have any effect on the steady-state behavior, it
degrades the transient response
...
fm Page 147 Monday, September 6, 1999 11:41 AM
Section 5
...
The absence of
current flow (ignoring leakage currents) means that the gate does not consume any
static power
...
The
situation was very different in the 1970s and early 1980s
...
The lack of complementary devices (such as the NMOS and PMOS transistor) in such a technology makes
the realization of inverters with zero static power non-trivial
...
The nature and the form of the voltage-transfer characteristic (VTC) can be graphically deduced by superimposing the current characteristics of the NMOS and the PMOS
devices
...
It requires
a
that the I-V curves of the NMOS and PMOS devices are transformed onto a common coordinate set
...
The PMOSI-V relations can be translated into
this variable space by the following relations (the subscripts and p denote the NMOS
n
and PMOS devices, respectively):
I DSp = – I DSn
V GSn = V in ; V GSp = V in – V DD
(5
...
This procedure is outlined in Figure 5
...
I Dp
IDn
IDn
Vin = 0
Vin = 0
Vin = 1
...
5
VDSp
VDSp
VGSp = –1
VGSp = –2
...
3 Transforming PMOS I-V characteristic to a common coordinate set
(assuming VDD = 2
...
Vout
chapter5
...
5
PMOS
Vin = 2
...
5
Vin = 1
...
5
Vin = 1
Vin = 2
Vin = 0
...
5
Vin = 0
Vout
Figure 5
...
5 V)
...
The resulting load lines are plotted in Figure 5
...
For a dc operating points to be
valid, the currents through the NMOS and PMOS devices must be equal
...
A
number of those points (for Vin = 0, 0
...
5, 2, and 2
...
As
can be observed, all operating points are located either at the high or low output levels
...
This results from
the high gain during the switching transient, when both NMOS and PMOS are simultaneously on, and in saturation
...
All these observations translate into the VTC of Figure
5
...
NMOS off
PMOS res
2
...
5
NMOS sat
PMOS sat
0
...
5
1
1
...
5
Vin
Figure 5
...
4 ( DD = 2
...
For each
V
operation region, the modes of the transistors are
annotated — off, res(istive), or sat(urated)
...
This
response is dominated mainly by the output capacitance of the gate, L, which is comC
chapter5
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
VDD
149
VDD
Rp
Vout
Vout
CL
CL
Rn
Vin = 0
Vin = VDD
(a) Low-to-high
(b) High-to-low
Figure 5
...
posed of the drain diffusion capacitances of the NMOS and PMOS transistors, the capacitance of the connecting wires, and the input capacitance of the fan-out gates
...
6)
...
6a)
...
In Example
4
...
Hence, a fast gate is built either by keeping the output capacitance
small or by decreasing the on-resistance of the transistor
...
Similar considerations are valid for the high-to-low
transition (Figure 5
...
The reader should
R
be aware that the on-resistance of the NMOS and PMOS transistor is not constant, but is a
nonlinear function of the voltage across the transistor
...
An in-depth analysis of how to analyze and optimize the
performance of the static CMOS inverter is offered in Section 5
...
5
...
It remains to determine the precise values of M, VIH, and VIL as well
V
as the noise margins
...
3
...
Its value can be
obtained graphically from the intersection of the VTC with the line given by in = Vout
V
(see Figure 5
...
In this region, both PMOS and NMOS are always saturated, since DS =
V
VGS
...
fm Page 150 Monday, September 6, 1999 11:41 AM
150
THE CMOS INVERTER
Chapter 5
sistors
...
We furthermore ignore the channelV
length modulation effects
...
2)
Solving for VM yields
VM
V + V DSATn + r V + V + V DSATp-------------------------------Tp
Tn
DD
k p V DSATp υ satp W p
2
2
= ----------------------------------------------------------------------------------------------------- = ---------------------- ------------------- (5
...
For large values
of VDD (compared to threshold and saturation voltages), Eq
...
3) can be simplified:
rV DD
V M ≈ -----------1+r
(5
...
(5
...
It is generally considered to be
desirable for VM to be located around the middle of the available voltage swing (or at
VDD/2), since this results in comparable values for the low and high noise margins
...
To move VM upwards, a larger value of r is
required, which means making the PMOS wider
...
From Eq
...
2), we can derive the required ratio of PMOS versus NMOS transistor
sizes such that the switching threshold is set to a desired valueVM
...
k′ n V DSATn ( V M – V Tn – V DSATn ⁄ 2 )
( W ⁄ L) p
-------------------= -------------------------------------------------------------------------------------------------( W ⁄ L) n
k′ p V DSATp ( V DD – V M + V Tp + V DSATp ⁄ 2 )
(5
...
1 Inverter switching threshold for long-channel devices, or low supply-voltages
...
When the PMOS and NMOS are long-channel devices, or when the supply voltage is low, velocity saturation does not occur VM-VT < VDSAT)
...
(5
...
Derive
...
6)
chapter5
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
151
Design Technique — Maximizing the noise margins
When designing static CMOS circuits, it is advisable to balance the driving strengths of the
transistors by making the PMOS section wider than the NMOS section, if one wants to maximize the noise margins and obtain symmetrical characteristics
...
(5
...
Example 5
...
25µm CMOS process, is located in the middle
between the supply rails
...
7, and
assume a supply voltage of 2
...
The minimum size device has a width/length ratio of 1
...
With the aid of Eq
...
5), we find
–6
( W ⁄ L)p
115 × 10 ×
-------------------= ------------------------- 0
...
25 – 0
...
63 ⁄ 2 ) = 3
...
0
( 1
...
4 – 1
...
7 plots the values of switching threshold as a function of the PMOS/NMOS
ratio, as obtained by circuit simulation
...
4 for a 1
...
(5
...
An analysis of the curve of Figure 5
...
VM is relatively insensitive to variations in the device ratio
...
g
...
5) do not disturb the transfer characteristic that much
...
For the above example, setting the ratio to 3, 2
...
22 V, 1
...
13 V, respectively
...
8
1
...
6
1
...
3
V
M
(V)
1
...
2
1
...
9
0
...
7 Simulated inverter switching
threshold versus PMOS/NMOS ratio (0
...
5 V)
chapter5
...
8 Changing the inverter threshold can improve the circuit reliability
...
The effect of changing the Wp/Wn ratio is to shift the transient region of the VTC
...
This property can be very useful, as asymmetrical transfer characteristics are actually desirable in some designs
...
8
...
Passing this signal
through a symmetrical inverter would lead to erroneous values (Figure 5
...
This
can be addressed by raising the threshold of the inverter, which results in a correct
response (Figure 5
...
Further in the text, we will see other circuit instances where
inverters with asymetrical switching thresholds are desirable
...
5/0
...
To move the threshold to 1
...
Observe that
Figure 5
...
5
...
2
Noise Margins
dV
By definition, VIH and VIL are the operational points of the inverter where out = – 1
...
lytical expressions forVIH and VIL, these tend to be unwieldy and provide little insight in
what parameters are instrumental in setting the noise margins
...
9
...
The crossover with the VOH and the
VOL lines is used to define VIH and VIL points
...
fm Page 153 Monday, September 6, 1999 11:41 AM
Section 5
...
9 A piece-wise linear
approximation of the VTC simplifies the
derivation of VIL and VIH
...
This approach yields the following expressions for the width of the transition region IH - VIL, VIH, VIL, and the noise marV
gins NMH and NML
...
7)
NM L = V IL
These expressions make it increasingly clear that a high gain in the transition region is
very desirable
...
Remains us to determine the midpoint gain of the static CMOS inverter
...
It is apparent from Figure
5
...
The channel-length modulation factor hence cannot be ignored in this analysis — doing so
would lead to an infinite gain
...
8), valid around the switching threshold, with respect to in
...
8)
V DSATp
k p V DSATp V in – V DD – V Tp – ---------------- ( 1 + λ p V out – λ p V DD ) = 0
2
Differentiation and solving fordV out/dVin yields
dV out
k n V DSATn ( 1 + λ n V out ) + k p V DSATp ( 1 + λ p V out – λ p V )
= – --------------------------------------------------------------------------------------------------------------------------------DD
---------------------------------------------------------------(5
...
fm Page 154 Monday, September 6, 1999 11:41 AM
154
THE CMOS INVERTER
1 k n V DSATn + k p V DSATp
--------------------------------------------------g = – ----------------λ n – λp
ID ( V M )
Chapter 5
(5
...
The gain is almost
purely determined by technology parameters, especially the channel length modulation
...
Example 5
...
25µm CMOS technology designed with a PMOS/NMOS
ratio of 3
...
375 µm, L = 0
...
5)
...
25 V),
I D(V M) = 1
...
63 × ( 1
...
43 – 0
...
06 × 1
...
5
g = – ---------------------- × 115 × 10 × 0
...
5 × 3
...
0 = –27
...
5
...
06 + 0
...
2 V, VIH = 1
...
2
...
10 plots the simulated VTC of the inverter, as well as its derivative, the gain
...
The actual values ofVIL and VIH are 1
...
45 V,
respectively, which leads to noise margins of 1
...
05 V
...
(5
...
As observed in Figure 5
...
This reduced gain would yield values for IL and VIH of 1
...
33
V
V, respectively
...
The obtained expressions are however perfectly useful as first-order estimations as
well as means of identifying the relevant parameters and their impact
...
Low values of 2
...
3 kΩ were
k
observed, respectively
...
SIDELINE: Surprisingly (or not so surprisingly), the static CMOS inverter can also be
used as an analog amplifier, as it has a fairly high gain in its transition region
...
10b
...
Yet, this observation
can be used to demonstrate one of the major differences between analog and digital
design
...
fm Page 155 Monday, September 6, 1999 11:41 AM
Section 5
...
5
155
0
-2
2
-4
-6
1
...
5
-16
0
0
0
...
5
2
2
...
5
V (V)
in
Figure 5
...
5 V)
...
5
2
2
...
25 CMOS, VDD
µm
device in the regions of extreme nonlinearity, resulting in well-defined and well-separated
high and low signals
...
2 Inverter noise margins for long-channel devices
Derive expressions for the gain and noise margins assuming that PMOS and NMOS are
long-channel devices (or that the supply voltage is low), so that velocity saturation does
not occur
...
3
...
Fortunately, the dc-characteristics
of the static CMOS inverter turn out to be rather insensitive to these variations, and the
gate remains functional over a wide range of operating conditions
...
7, which shows that variations in the device sizes have only a minor
impact on the switching threshold of the inverter
...
Two corner-cases are plotted in
Figure 5
...
Comparing the resulting curves with the nominal response shows that
the variations mostly cause a shift in the switching threshold, but that the operation of the
chapter5
...
5
Good PMOS
Bad NMOS
2
(V)
1
...
5
0
0
Good NMOS
Bad PMOS
0
...
5
2
2
...
11 Impact of device variations on static CMOS
inverter VTC
...
The opposite
is true for the “bad” transistor
...
This robust behavior that ensures functionality of the gate
over a wide range of conditions has contributed in a big way to the popularity of the static
CMOS gate
...
At the same time, device threshold voltages are virtually kept constant
...
Do inverters keep on working
when the voltages are scaled and are there potential limits to the supply scaling?
A first hint on what might happen was offered in Eq
...
10), which indicates that the
gain of the inverter in the transition region actually increases with a reduction of the supply voltage! Note that for a fixed transistor ratio VM is approximately proportional to
r,
VDD
...
12a)
...
5 V —
which is just 100 mV above the threshold of the transistors — the width of the transition
region measures only 10% of the supply voltage (for a maximum gain of 35), while it widens to 17% for 2
...
So, given this improvement in dc characteristics, why do we not
choose to operate all our digital circuits at these low supply voltages? Three important
arguments come to mind:
• In the following sections, we will learn that reducing the supply voltage indiscriminately has a positive impact on the energy dissipation, but is absolutely detrimental
to the performance on the gate
...
• Scaling the supply voltage means reducing the signal swing
...
chapter5
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
2
...
2
2
0
...
5
1
0
...
05
0
...
5
1
1
...
5
0
0
V (V)
in
(a) Reducing VDD improves the gain
...
12
0
...
1
V (V)
0
...
2
in
(b) but it detoriates for very-low supply voltages
...
VTC of CMOS inverter as a function of supply voltage (0
...
To provide an insight into the question on potential limits to the voltage scaling, we
have plotted in Figure 5
...
Amazingly enough, we still obtain an inverter characteristic,
this while the supply voltage is not even large enough to turn the transistors on! The explanation can be found in the sub-threshold operation of the transistors
...
The very low value of the switching currents ensures a
very slow operation but this might be acceptable for some applications (such as watches,
for example)
...
VOL and VOH are no longer at the supply rails and the transition-region gain approaches
1
...
To achieving sufficient gain for
use in a digital circuit, it is necessary that the supply must be at least a couple timesΤ =
φ
kT/q (=25 mV at room temperature), the thermal voltage introduced in Chapter 3
[Swanson72]
...
kT
V DDmin > 2…4 ----q
(5
...
(5
...
It suggests that the only way to
get CMOS inverters to operate below 100 mV is to reduce the ambient temperature, or in
other words to cool the circuit
...
3 Minimum supply voltage of CMOS inverter
Once the supply voltage drops below the threshold voltage, the transistors operate the subthreshold region, and display an exponential current-voltage relationship (as expressed in
Eq
...
40))
...
fm Page 158 Monday, September 6, 1999 11:41 AM
158
THE CMOS INVERTER
Chapter 5
(assume symmetrical NMOS and PMOS transistors, and a maximum gain at M = VDD/2)
...
1 V ⁄ 2φ
g = – -- ( e DD T – 1 )
n
(5
...
5 and φT = 25
V
mV)
...
4 Performance of CMOS Inverter: The Dynamic Behavior
The qualitative analysis presented earlier concluded that the propagation delay of the
CMOS inverter is determined by the time it takes to charge and discharge the load capacitor CL through the PMOS and NMOS transistors, respectively
...
It is hence worthwhile to first study the major components of the load
capacitance before embarking onto an in-depth analysis of the propagation delay of the
gate
...
5
...
1
Computing the Capacitances
Manual analysis of MOS circuits where each capacitor is considered individually is virtually impossible and is exacerbated by the many nonlinear capacitances in the MOS transistor model
...
Be aware that this is
a considerable simplification of the actual situation, even in the case of a simple inverter
...
13
M4
Cw
Cg3
M3
Parasitic capacitances, influencing the transient behavior of the cascaded inverter pair
...
fm Page 159 Monday, September 6, 1999 11:41 AM
Section 5
...
13 shows the schematic of a cascaded inverter pair
...
It is initially assumed that the
input Vin is driven by an ideal voltage source with zero rise and fall times
...
Gate-Drain Capacitance Cgd12
M1 and M2 are either in cut-off or in the saturation mode during the first half (up to 50%
point) of the output transient
...
The channel capacitance of the MOS
transistors does not play a role here, as it is located either completely between gate and
bulk (cut-off) or gate and source (saturation) (see Chapter 3)
...
This is accomplished by taking the so-called Miller
effect into account
...
14)
...
To present an identical load to the output node, the capacitance-to-ground must have a value that is twice as
large as the floating capacitance
...
For an in-depth
discussion of the Miller effect, please refer to textbooks such as Sedra and Smith
([Sedra87], p
...
1
Cgd1
∆V
Vout
Vout
∆V
Vin
M1
∆V
2Cgd1
∆V
M1
Vin
Figure 5
...
Diffusion Capacitances Cdb1 and Cdb2
The capacitance between drain and bulk is due to the reverse-biased
pn-junction
...
We argued in Chapter 3 that the best approach towards simplifying the analysis is to
replace the nonlinear capacitor by a linear one with the same change in charge for the voltage range of interest
...
1
The Miller effect discussed in this context is a simplified version of the general analog case
...
chapter5
...
13)
with Cj0 the junction capacitance per unit area under zero-bias conditions
...
(3
...
14)
with φ0 the built-in junction potential and m the grading coefficient of the junction
...
Example 5
...
5 V CMOS Inverter
Consider the inverter of Figure 5
...
25 CMOS technology
...
5
...
13)
...
For the CMOS
inverter, this is the time-instance whereVout reaches 1
...
5 V
...
5 V, 1
...
25 V} for the low-to-high
transition
...
5 V
...
5 V
over the drain junction or Vhigh = −2
...
At the 50% point, Vout = 1
...
25 V
...
(5
...
5, φ0 = 0
...
57,
Sidewall: Keqsw (m = 0
...
9) = 0
...
25 V, respectively,
resulting in higher values forKeq,
Bottom plate: Keq (m = 0
...
9) = 0
...
44, φ0 = 0
...
81
The PMOS transistor displays a reverse behavior, as its substrate is connected to 2
...
Hence, for the high-to-low transition ( low = 0, Vhigh = −1
...
48, φ0 = 0
...
79,
Sidewall: Keqsw (m = 0
...
9) = 0
...
25 V, Vhigh = −2
...
48, φ0 = 0
...
59,
Sidewall: Keqsw (m = 0
...
9) = 0
...
The result of the linearization is a minor distortion of the voltage waveforms
...
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
161
Wiring Capacitance Cw
The capacitance due to the wiring depends upon the length and width of the connecting
wires, and is a function of the distance of the fanout from the driving gate and the number
of fanout gates
...
Gate Capacitance of Fanout Cg3 and Cg4
We assume that the fanout capacitance equals the total gate capacitance of the loading
gates M3 and M4
...
15)
This expression simplifies the actual situation in two ways:
• It assumes that all components of the gate capacitance are connected between out
V
and GND (or VDD), and ignores the Miller effect on the gate-drain capacitances
...
• A second approximation is that the channel capacitance of the connecting gate is
constant over the interval of interest
...
The total channel capacitance is a function of the operation mode of the
device, and varies from approximately 1/3 of
WLCox (cut-off) over 2/3 WLCox (saturation) to the full WLCox (linear)
...
Ignoring the capacitance variation
results in a pessimistic estimation with an error of approximately 10%, which is
acceptable for a first order analysis
...
4 Capacitances of a 0
...
25 CMOS techµm
nology
...
15
...
5 V
...
This data is summarized
in Table 5
...
As an example, we will derive the drain area and perimeter for the NMOS transistor
...
This results in a
total area of 19 λ2, or 0
...
125 µm)
...
125 = 1
...
Notice that the gate side of the drain perimeter is not
included, as this is not considered a part of the side-wall
...
7 µm2; PD = 5 + 9 + 5 = 19 λ, or 2
...
chapter5
...
25 µm = 2λ
Out
In
Metal 1
Polysilicon
NMOS
(3λ/2λ)
GND
Figure 5
...
Table 5
...
W/L
AD (µm2)
PD (µm)
AS (µm2)
PS (µm)
NMOS
0
...
25
PMOS
1
...
25
0
...
875 (15λ)
0
...
875 (15λ)
0
...
375 (19λ)
0
...
375 (19λ)
2
2
This physical information can be combined with the approximations derived above to
come up with an estimation of CL
...
5, and repeated here for convenience:
Overlap capacitance: CGD0(NMOS) = 0
...
27 fF/µm
µ
Bottom junction capacitance: CJ(NMOS) = 2 fF/ m2; CJ(PMOS) = 1
...
28 fF/ CJSW(PMOS) = 0
...
A layout extraction program typically
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
163
will deliver us precise values for this parasitic capacitance
...
With the aid of the interconnect parameters of Table 4
...
Due to the short length of the wire, this contribution is ignorable compared to the other parasitics
...
12 fF
Bringing all the components together results in Table 5
...
We use the values of eq
K
derived in Example 5
...
Notice that the load
capacitance is almost evenly split between its two major components: the intrinsic capacitance, composed of diffusion and overlap capacitances, and the extrinsic load capacitance,
contributed by wire and connecting gate
...
2
Components of CL (for high-to-low and low-to-high transitions)
...
23
0
...
61
0
...
66
0
...
5
1
...
76
0
...
28
2
...
12
0
...
4
...
1
6
...
This results in the expression of Eq
...
16)
...
16)
v1
with i the (dis)charging current, v the voltage over the capacitor, andv1 and v2 the initial
and final voltage
...
We rather fall back to the simplified switch-model of the
inverter introduced in Figure 5
...
The voltage-dependencies of the on-resistance and the
load capacitor are addressed by replacing both by a constant linear element with a value
averaged over the interval of interest
...
fm Page 164 Monday, September 6, 1999 11:41 AM
164
THE CMOS INVERTER
Chapter 5
for the load capacitance
...
8, and is repeated here for convenience
...
17)
2
V DSAT
with I DSAT = k' W ( V DD – V T )V DSAT – -----------------
2
L
Deriving the propagation delay of the resulting circuit is now straightforward, and is
nothing more than the analysis of a first-order linear
RC-network, identical to the exercise
of Example 4
...
There, we learned that the propagation delay of such a network for a voltage step at the input is proportional to the time-constant of the network, formed by pulldown resistor and load capacitance
...
69 R eqn C L
(5
...
69R eqp C L
(5
...
This analysis assumes that the equivalent load-capacitance is identical for both the highto-low and low-to-high transitions
...
The overall propagation delay of the inverter is
defined as the average of the two values, or
t pHL + t pLH
t p = -------------------------=
2
R eqn + R eqp
0
...
20)
Very often, it is desirable for a gate to have identical propagation delays for both rising
and falling inputs
...
Remember that this condition is identical to the
requirement for a symmetrical VTC
...
5 Propagation Delay of a 0
...
15, we make use of Eq
...
18) and Eq
...
19)
...
4, while
the equivalent on-resistances of the transistors for the generic 0
...
3
...
5 V, the normalized on-resistances of NMOS
and PMOS transistors equal 13 kΩ and 31 kΩ, respectively
...
5 for the NMOS , and 4
...
We
assume that the difference between drawn and effective dimensions is small enough to be
ignorable
...
fm Page 165 Monday, September 6, 1999 11:41 AM
Section 5
...
5
Vout
2
V
out
(V)
1
...
5
Figure 5
...
15
...
5
0
0
...
5
2
2
...
69 × ------------- × 6
...
5 31k Ω
t pLH = 0
...
0fF = 29 psec
4
...
5 psec
-----------------2
The accuracy of this analysis is checked by performing a SPICE transient simulation
on the circuit schematic, extracted from the layout of Figure 5
...
The computed transient
response of the circuit is plotted in Figure 5
...
9 psec and 31
...
The manual results are good
considering the many simplifications made during their derivation
...
These are caused by the gate-drain capacitances
of the inverter transistors, which couple the steep voltage step at the input node directly to the
output before the transistors can even start to react to the changes at the input
...
WARNING: This example might give the impression that manual analysis always leads
to close approximations of the actual response
...
Large
deviations can often be observed between first- and higher-order models
...
A detailed simulation is indispensable when quantitative data is
required
...
chapter5
...
To provide an answer to this question, it is necessary
to make the parameters governing the delay explicit by expanding eq in the delay equaR
tion
...
(5
...
(5
...
69 -- ----------------= 0
...
21)
4 I DSATn
( W ⁄ L ) n k′ n V DSATn ( V DD – V Tn – V DSATn ⁄ 2 )
In the majority of designs, the supply voltage is chosen high enough so that DD >> VTn +
V
VDSATn/2
...
(5
...
Observe that this is a first-order approximation, and that increasing
the supply voltage yields an observable, albeit small, improvement in performance due to
a non-zero channel-length modulation factor
...
52 -------------------------------------------( W ⁄ L ) n k′ n V DSATn
(5
...
17, which plots the propagation delay of the
inverter as a function of the supply voltage
...
27, which charts the equivalent on-resistance
of the MOS transistor as a function of VDD
...
5
5
4
...
5
3
2
...
5
1
0
...
2
1
...
6
V
1
...
2
2
...
17 Propagation delay of CMOS
inverter as a function of supply voltage (
normalized with respect to the delay at 2
...
The dots indicate the delay values
predicted by Eq
...
21)
...
Hence, the deviation at
low supply voltages
...
This operation region should clearly be avoided if achieving high performance is a
premier design goal
...
fm Page 167 Monday, September 6, 1999 11:41 AM
Section 5
...
Remember that three major factors contribute to the load capacitance: the
internal diffusion capacitance of the gate itself, the interconnect capacitance, and the fanout
...
Good
design practice requires keeping the drain diffusion areas as small as possible
...
This is the most powerful and effective performance optimization tool in the hands of the designer
...
Increasing the transistor size also raises the diffusion
capacitance and hence CL
...
e
...
This effect is called “
self-loading”
...
• Increase VDD
...
17, the delay of a gate can be modulated by
modifying the supply voltage
...
However, increasing the supply voltage above a certain level yields only very minimal improvement and hence
should be avoided
...
Example 5
...
5
...
An insight in the potential improvement can be obtained by partitioning the load
capacitance into an intrinsic (diffusion and miller) and an extrinsic (wiring and fanout) component, or
C L = C int + C ext = C int ( 1 + α )
(5
...
Widening both NMOS and
PMOS of the driving inverter with a factorS reduces their equivalent resistance by an identical factor, but also raises the intrinsic capacitance of the gate by approximately the same ratio
...
69 ( S + α )C int --------------------------- = 1 + --- t p0
2S
S
(5
...
Making S infinitially large yields the maximum obtainable performance gain, equal to 1/(1+ Yet, any sizα)
...
For the example in question, we find from Table 5
...
05 (Cint = 3
...
15 fF)
...
05
...
chapter5
...
8
x 10
3
...
4
p
t (sec)
3
...
8
2
...
4
2
...
18 Increasing inverter performance by
sizing the NMOS and PMOS transistor with an
identical factor S for a fixed fanout (inverter of
Figure 5
...
mance improvement of 1
...
3 psec)
...
18, we observe that
the bulk of the improvement is already obtained forS = 5, and that sizing factors larger than
10 barely yield any extra gain
...
4 Propagation Delay as a Function of (dis)charge Current
So far, we have expressed the propagation delay as a function of the equivalent resistance of
the transistors
...
Derive an expression of the propagation delay using this alternative approach
...
4
...
s,
Impact of Fanout
Eq
...
23) states that the load capacitance of the inverter can be divided into an intrinsic
and an extrinsic component
...
Assuming that each fanout gate
2
presents an identical load, and that the wiring capacitance is proportional to the fanout,
we can rewrite the delay equation as a function of the fanoutN
...
25)
The linear relationship between fanout and wiring capacitance has been confirmed by a number of heuristic studies [REF]
...
fm Page 169 Monday, September 6, 1999 11:41 AM
Section 5
...
Large fanout factors should hence be avoided if performance is an issue
...
NMOS/PMOS Ratio
So far, we have consistently widened the PMOS transistor so that its resistance matches
that of the pull-down NMOS device
...
5 between
PMOS and NMOS width
...
However, this does not imply that this ratio also yields the minimum overall propagation delay
...
When two contradictory effects are present, there must exist
a transistor ratio that optimizes the propagation delay of the inverter
...
Consider
two identical, cascaded CMOS inverters
...
26)
where Cdp1 and Cdn1 are the equivalent drain diffusion capacitances of PMOS and NMOS
transistors of the first inverter, whileCgp2 and Cgn2 are the gate capacitances of the second
gate
...
When the PMOS devices are madeβ times larger than the NMOS ones ( = (W/L)p /
β
(W/L)n), all transistor capacitances will scale in approximately the same way, or dp1 ≈ β
C
Cdn1, and Cgp2 ≈ β Cgn2
...
(5
...
27)
An expression for the propagation delay can be derived, based on Eq
...
20)
...
69 ( ( 1 + β ) ( C dn 1 + C gn 2 ) + C W ) R eqn + ------------------
β
2
r
= 0
...
28)
r (= Reqp/Reqn) represents the resistance ratio of identically-sized PMOS and NMOS transistors
...
29)
chapter5
...
If the wiring
capacitance dominates, larger values ofβ should be used
...
Example 5
...
From the values of the equivalent resitances
(Table 3
...
4 (= 31 kΩ / 13 kΩ) would yield a symmetrical transient response
...
(5
...
6
...
19, which plots the simulated propagation delay as a function of the transistor ratioβ
...
The optimum point occurs around β = 1
...
Observe also that the rising and falling delays are identical at the
predicted point of β equal to 2
...
-11
x 10
5
tpLH
tpHL
tp
4
p
t (sec)
4
...
5
3
1
1
...
5
3
3
...
5
5
Figure 5
...
β
...
Only one of the devices is assumed
to be on during the (dis)charging process
...
This affects the
total current available for (dis)charging and impacts the propagation delay
...
20
plots the propagation delay of a minimum-size inverter as a function of the input signal
slope— as obtained from SPICE
...
While it is possible to derive an analytical expression describing the relationship
between input signal slope and propagation delay, the result tends to be complex and of
limited value
...
If the latter would be infinitely strong, its output slope would be
zero, and the performance of the gate under examination would be unaffected
...
fm Page 171 Monday, September 6, 1999 11:41 AM
Section 5
...
4
x 10
171
-11
5
...
8
4
...
4
4
...
8
3
...
20 tp as a function of the
input signal slope (10-90% rise or
fall time) for minimum-size
inverter with fan-out of a single
gate
...
This leads to a revised expression for the propagation delay of an
inverter i in a chain of inverters [Hedenstierna87]:
i
i
i–1
t p = t step + ηt step
(5
...
(5
...
e
...
The fraction η is an empirical constant
...
Example 5
...
All inverters in this example are assumed to be identical,
and to have an intrinsic propagation delaytp0
...
(5
...
(5
...
i
t p = t p0 ( 1 + αN ) + η t p0 ( 1 + αM )
= t p0 ( 1 + η + α ( N + ηM ) )
(5
...
Typical values for the
parameters α and η are around 1 and 0
...
Experiments have demonstrated that
the model of Eq
...
31) forms a good approximation of the actual dependencies, although
some important deviations can be observed for small values of and M
...
fm Page 172 Monday, September 6, 1999 11:41 AM
172
THE CMOS INVERTER
…M
i-1
…N
Figure 5
...
M and N denote the fanout factors of
inverter i-1 and i, respectively
...
This proves to be true not only for performance, but also for power consumption considerations as will be discussed later
...
Problem 5
...
Explain your answer
...
When gates get farther apart, the wire capacitance and resistance can no longer be ignored, and may even
dominate the transient response
...
The analysis detailed in Example 4
...
Consider the circuit of Figure 5
...
The driver is represented by a single resistance dr, wich is the
R
average between Reqn and Reqp
...
(rw,cw,L)
Vout
Cint
Vout
Cfan
Figure 5
...
The propagation delay of the circuit can be obtained by applying the Ellmore delay
expression
...
fm Page 173 Monday, September 6, 1999 11:41 AM
Section 5
...
69R dr C int + ( 0
...
38 R w )C w + 0
...
69R dr ( C int + C fan ) + 0
...
38r w c w L
2
(5
...
38 factor accounts for the fact that the wire represents a distributed delay
...
The delay
expressions contains a component that is linear with the wire length, as well a quadratic
one
...
Example 5
...
22, and assume the device parameters of Example 5
...
5(13/1
...
5) = 7
...
The wire is implemented in metal1
and has a width of 0
...
This yields the following parameters: =
cw
92 aF/µm, and rw = 0
...
4)
...
(5
...
Solving the following quadratic equation yields a single useful
solution
...
6 × 10
– 18 2
L + 0
...
29 × 10
– 12
or
L = 65 µm
Observe that the extra delay is solely due to the linear factor in the equation, and more specifically due to the extra capacitance introduced by the wire
...
This is
due to the high resistance of the (minimum-size) driver transistors
...
Analyze, for instance, the same problem with the
driver transistors 100 times wider, as is typical in high-speed, large fan-out drivers
...
5 Power, Energy, and Energy-Delay
So far, we have seen that the static CMOS inverter with its almost ideal VTC— symmetrical shape, full logic swing, and high noise margins— offers a superior robustness, which
simplifies the design process considerably and opens the door for design automation
...
It is this combination of robustness and low
static power that has made static CMOS the technology of choice of most contemporary
digital designs
...
chapter5
...
5
...
Part of this
energy is dissipated in the PMOS device, while the remainder is stored on the load capacitor
...
A precise measure for this energy consumpVDD
tion can be derived
...
We assume, initially, that the input
iVDD
waveform has zero rise and fall times, or, in other
words, that the NMOS and PMOS devices are never
vout
on simultaneously
...
23 is valid
...
23 Equivalent circuit
the instantaneous power over the period of interest
...
The corresponding waveforms of vout(t) and iVDD(t)
are pictured in Figure 5
...
∞
∞
∫
∫
0
0
E VDD = i VDD ( t )V DD dt = V DD
∞
∞
∫
∫
0
0
E C = i VDD ( t )v out dt =
dv out
C L ----------- t = C L V DD
d
dt
dv out
v
C L ----------- out dt = C L
dt
V DD
∫ dv
out
2
= C L V DD
(5
...
34)
0
These results can also be derived by observing that during the low-to-high transition, CL is loaded with a charge CLVDD
...
The energy stored on the capacitor equalsCLVDD2/2
...
The
C
other half has been dissipated by the PMOS transistor
...
Once again, there is no dependence on the size of the device
...
In order to compute the power consumption, we have to take
3
Observe that this model is a simplification of the actual circuit
...
The latter experience a charge-discharge cycle that is out of phase with the capacitances to GND,
i
...
they get charged when Vout goes low and discharged when Vout rises
...
chapter5
...
5
iVDD
t
Charge
Discharge
t
Figure 5
...
into account how often the device is switched
...
35)
f0→1 represents the frequency of energy-consuming transitions, this is 0 1 transitions
→
for static CMOS
...
At
f
the same time, the total capacitance on the chip CL) increases as more and more gates are
(
placed on a single die
...
25µm CMOS chip with a clock rate of
500 Mhz and an average load capacitance of 15 fF/gate, assuming a fanout of 4
...
5 V supply then equals approximately 50
µW
...
In reality, not all gates in the complete IC switch at the full rate of
500 Mhz
...
Example 5
...
4 is now easily computed
...
2, the value of the load capacaitance was determined to equal 6 fF
...
5 V, the amount of energy needed to charge and discharge that capacitance equals
2
E dyn = C L V DD = 37
...
For a tp of 32
...
5), we find that the dynamic power dissipation of
the circuit is
P dyn = E dyn ⁄ ( 2t p ) = 580 µW
Of course, an inverter in an actual circuit is rarely switched at this maximum rate, and
even if done so, the output does not swing from rail-to-rail
...
For a rate of 4 GHz T = 250 psec), the dissipation reduces to 150µW
...
chapter5
...
While the switching activity is easily computed for an
inverter, it turns out to be far more complex in the case of higher-order gates and circuits
...
Other factors influencing the
activity are the overall network topology and the function to be implemented
...
36)
where f now presents the maximum possible event rate of the inputs (which is often the
clock rate) and P0→1 the probability that a clock event resultsin a 0 → 1 (or power-consuming) event at the output of the gate
...
For our example, an
activity factor of 10% ( 0→1 = 0
...
P
Example 5
...
Power consuming transitions Output signal
occur 2 out of 8 times, which is
equaivalent to a transition proba- Figure 5
...
25 (or 25%)
...
This is one of the reasons that lower supply
voltages are becoming more and more attractive
...
For instance, reducingV DD from 2
...
25 V for our example drops the power dissipation from 5 W to 1
...
This assumes that the same clock rate can be sustained
...
17
demonstrates that this assumption is not that unrealistic as long as the supply voltage is substantially higher than the threshold voltage
...
When a lower bound on the supply voltage is set by external constraints (as often happens in real-world designs), or when the performance degradation due to lowering the supply
voltage is intolerable, the only means of reducing the dissipation is by lowering the effective
capacitance
...
A reduction in the switching activity can only be accomplished at the logic and architectural abstraction levels, and will be discussed in more detail in later Chapters
...
fm Page 177 Monday, September 6, 1999 11:41 AM
Section 5
...
As most of the capacitance in a combinational logic circuit is due to transistor capacitances (gate and diffusion), it makes sense to keep those contributions to a
minimum when designing for low power
...
This definitely affects the performance of the circuit, but
the effect can be offset by using logic or architectural speed-up techniques
...
This is contrary to common design practices used in cell libraries, where transistors are generally made large to accommodate a range
of loading and performance requirements
...
Assume we have to minimize the energy dissipation of a circuit with a specified lower-bound on the performance
...
Yet, the latter causes the capacitance to
increase
...
To analyze the transistor-sizing for minimum energy problem, let us analyze the simple
case of a static CMOS inverter driving a load capacitance consisting of an intrinsic ( ) and
Cint
an extrinsic component (Cext) (Figure 5
...
While the former represents the diffusion capacitances, the latter stands for wiring capacitance and fan-out
...
The factor S stands for the inverter sizing factor
,
where S is equal to 1 for an inverter constructed of minimum-size devices
...
Figure
S
5
...
The speed of
α
all implementations is kept constant by appropriately adjusting the supply voltage: larger values of S normally mean lower values of the supply voltage
...
5
2
2
...
5
Scaling Factor S
α=5
4
4
...
26 Normalized energy of a MOS inverter with load capacitance L, as a function of the inverter
C
size S and the ratio between the extrinsic and intrinsic capacitance (= Cext/Cint)
...
5 V)
...
fm Page 178 Monday, September 6, 1999 11:41 AM
178
THE CMOS INVERTER
Chapter 5
When α = 0 (or the load capacitance is zero), the lowest energy consumption is obtained
when using minimum-size devices
...
This result should come as no surprise: transistor sizing to increase performance— and reduce the energy by lowering the supply voltage— only
makes sense as long as performance is dominated by the extrinsic capacitance
...
For example, a sizing
factor S of 3
...
The energy-reduction— with a factor
of 4 with respect to the circuit instance with minimum-size devices— requires that the supply
voltage be reduced to 1
...
Example 5
...
26 as
a function of S and α
...
An expression for the propagation delay of the gate was already derived in Eq
...
24),
and is repeated here for convenience
...
69 ( S + α )C int --------------------------- = 1 + --- t p0
2S
S
tpo stands for the intrinsic delay of the gate at the reference voltage DD
...
(5
...
V DD
1
=
t p0 ∼ ------------------------- -------------------------------V DD – V TE
1 – V TE ⁄ V DD
(5
...
Keeping the propagation delay of the scaled inverter constant with respect to the reference case means lowering the supply voltage:
V′
S(1 + α)
DD
----------- = ------------------------------------------------------------------------V TE
α ( S – 1 ) + ( S + α ) ( V TE ⁄ V DD )
where V′DD and VDD are the supply voltages of the scaled and reference inverters, respectively
...
5 V and VTE = 0
...
26
...
The reader should further be aware that
the presented model is somewhat optimistic, as it ignores the extra energy dissipation related
to the increased gate capacitance of the driving transistors
...
fm Page 179 Monday, September 6, 1999 11:41 AM
Section 5
...
The finite slope of the input signal causes a direct current path between DD
V
and GND for a short period of time during switching, while the NMOS and the PMOS
transistors are conducting simultaneously
...
27
...
38)
as well as the average power consumption
2
P dp = t sc VDD I peak f = C sc VDD f
(5
...
27
Short-circuit currents during transients
...
tsc represents the time both devices are conducting
...
(5
...
V DD – 2V T
V DD – 2V T t r ( f )
t sc = ------------------------- ≈ ------------------------- -------ts
×
V DD
V DD
0
...
40)
Ipeak is determined by the saturation current of the devices and is hence directly proportional to the sizes of the transistors
...
This relationship is best illustrated by the following simple analyis: Consider a static CMOS inverter with a 0 1 transition at the input
...
28a)
...
As the source-drain
voltage of the PMOS device is approximately 0 during that period, the device shuts off
without ever delivering any current
...
chapter5
...
28 Impact of load capacitance on short-circuit current
...
28b)
...
This clearly
represents the worst-case condition
...
29, which plots the short-circuit current through the NMOS transistor during a
low-to-high transition as a function of the load capacitance
...
5
x 10
CL = 20 fF
2
Isc (A)
1
...
5
Figure 5
...
0
-0
...
On the other hand,
making the output rise/fall time too large slows down the circuit and can cause short-circuit currents in the fan-out gates
...
Design Techniques
A more practical rule, which optimizes the power consumption in a global way, can be formulated (Veendrick84]):
chapter5
...
5
Power, Energy, and Energy-Delay
181
The power dissipation due to short-circuit currents is minimized by matching the rise/fall
times of the input and output signals
...
Making the input and output rise times of a gate identical is not the optimum solution for
that particular gate on its own, but keeps the overall short-circuit current within bounds
...
30, which plots the short-circuit energy dissipation of an inverter (normal8
W/L|P = 1
...
25 µm
7
W/L|N = 0
...
25 µm
CL = 30 fF
VDD = 3
...
5 V
2
1
VDD = 1
...
30 Power dissipation of a static CMOS
inverter as a function of the ratio between input
and output rise/fall times
...
At low values of the slope ratio, inputoutput coupling leads to some extra dissipation
...
When the load capacitance is too small for a given inverter size
(r > 2…3 for VDD = 5 V), the power is dominated by the short-circuit current
...
When the rise/fall times of inputs and outputs are equalized, most power dissipation is associated with the dynamic power and only a minor fraction (< 10%) is devoted to
short-circuit currents
...
(5
...
In the extreme case, when VDD < VTn + |VTp|,
short-circuit dissipation is completely eliminated, because both devices are never on
simultaneously
...
At a supply voltage of 2
...
5 V, an input/output slope ratio of 2 is
needed to cause a 10% degradation in dissipation
...
(5
...
The value of this short-circuit capacitance is a function ofVDD, the transistor
sizes, and the input-output slope ratio
...
fm Page 182 Monday, September 6, 1999 11:41 AM
182
THE CMOS INVERTER
5
...
2
Chapter 5
Static Consumption
The static (or steady-state) power dissipation of a circuit is is expressed by Eq
...
41),
where Istat is the current that flows between the supply rails in the absence of switching
activity
P stat = I stat V DD
(5
...
There is, unfortunately, a leakage current flowing through the reverse-biased diode junctions of the transistors, located between the source or drain and the substrate as shown in Figure 5
...
This
contribution is, in general, very small and can be ignored
...
For a die with 1 million gates, each with a drain area of 0
...
5 V, the worst-case power consumption due to
diode leakage equals 0
...
VDD
VDD
Vout = V DD
Drain Leakage
Current
Subthreshold current
Figure 5
...
However, be aware that the junction leakage currents are caused by thermally generated carriers
...
At 85 (a common junction temperature limit for commercial
°C
hardware), the leakage currents increase by a factor of 60 over their room-temperature values
...
As the temperature is a strong function of the dissipated heat and its removal mechanisms, this can only be accomplished by limiting the power dissipation of the circuit
and/or by using chip packages that support efficient heat removal
...
As discussed in Chapter 3, an MOS transistor can experience a drain-source current, even
when VGS is smaller than the threshold voltage (Figure 5
...
The closer the threshold
voltage is to zero volts, the larger the leakage current at GS = 0 V and the larger the static
V
power consumption
...
Standard processes feature VT values that are never smaller than
0
...
6V and that in some cases are even substantially higher (~ 0
...
This approach is being challenged by the reduction in supply voltages that typically
goes with deep-submicron technology scaling as became apparent in Figure 3
...
We con-
chapter5
...
5
Power, Energy, and Energy-Delay
183
cluded earlier (Figure 5
...
One approach to address this performance issue is to scale the device
thresholds down as well
...
17 to the left, which means that
the performance penalty for lowering the supply voltage is reduced
...
32
...
The continued scaling
of the supply voltage predicted for the next generations of CMOS technologies will however force the threshold voltages ever downwards, and will make subthreshold conduction
a dominant source of power dissipation
...
2
VT = 0
...
32 Decreasing the threshold
increases the subthreshold current atVGS =
0
...
An example of the
latter is the SOI (Silicon-on-Insulator) technology whose MOS transistors have slope-factors that are close to the ideal 60 mV/decade
...
13 Impact of threshold reduction on performance and static power dissipation
Consider a minimum size NMOS transistor in the 0
...
In Chapter 3,
we derived that the slope factor S for this device equals 90 mV/decade
...
5V equals 10 A (Figure 3
...
Reducing the threshold with 200 mV to 0
...
5 V, this translates into
6
a static power dissipation of 10 ×170×10-11×1
...
6 mW
...
5 W! At that supply voltage,
the threshold reductions correspond to a performance improvement of 25% and 40%, respectively
...
The idea that the leakage current in a static CMOS circuit has to be zero is a preconception
...
As long as the noise margins are within range, this is not a compelling issue
...
fm Page 184 Monday, September 6, 1999 11:41 AM
184
THE CMOS INVERTER
Chapter 5
tion
...
For a
0
...
7 V VT; and 0
...
1 V VT
...
The optimal operation point
depends upon the activity of the circuit
...
Power-down (also calledstandby) can be accomplished by disconnecting the unit from the supply rails, or by lowering the supply voltage
...
5
...
42)
In typical CMOS circuits, the capacitive dissipation is by far the dominant factor
...
Leakage is ignorable at present, but this might change in the not too distant future
...
PDP = Pav t p
(5
...
Assuming that the gate is switched at its maximum possible rate of max = 1/(2tp), and
f
ignoring the contributions of the static and direct-path currents to the power consumption,
we find
2
C L V DD
2
PDP = CL V DD f max t p = ---------------2
(5
...
Remember that earlier we had definedEav as the average
energy per switching cycle (or per energy-consuming event)
...
Energy-Delay Product
The validity of the PDP as a quality metric for a process technology or gate topology is
questionable
...
fm Page 185 Monday, September 6, 1999 11:41 AM
Section 5
...
Yet for a given structure, this number can be made arbitrarily low by
reducing the supply voltage
...
This comes at the
major expense in performance, at discussed earlier
...
The energy-delay product (EDP) does exactly
that
...
45)
It is worth analyzing the voltage dependence of the EDP
...
An optimum operation point should hence exist
...
(5
...
αC L V DD
t p ≈ -----------------------V DD – V Te
(5
...
Combining Eq
...
45) and Eq
...
46), 4
2
3
αC L V DD
EDP = --------------------------------2 ( V DD – V TE )
(5
...
(5
...
3
V DDopt = -- V TE
2
(5
...
For sub-micron technologies with
thresholds in the range of 0
...
Example 5
...
25 µm CMOS inverter
From the technology parameters for our generic CMOS process presented in Chapter 3, the
value of VTE can be derived
...
43 V, VDsatn = 0
...
74 V
...
4 V, VDsatp = -1 V, VTEp = -0
...
VTE ≈ (VTEn+|VTEp|)/2 = 0
...
8 V = 1
...
The simulated graphs of Figure 5
...
The optimum supply volt4
This equation is only accurate as long as the devices remain in velocity saturation, which is probably
not the case for the lower supply voltages
...
chapter5
...
1 V
...
15
Energy-Delay (norm)
Energy-Delay
10
Energy
Delay
5
0
0
...
5
V (V)
2
Figure 5
...
25 µm CMOS technology
...
5
DD
WARNING: While the above example demonstrates that there exists a supply voltage
that minimizes the energy-delay product of a gate, this voltage does not necessarily represent the optimum voltage for a given design problem
...
Similarly, a lower-energy design is possible by operating by circuit at a lower voltage and by
obtaining the overall system performance through the use of architectural techniques such
as pipelining or concurrency
...
5
...
T
∫
∫
0
P av
T
0
V DD
1
= -- p ( t )dt = --------- i DD ( t )dt
T
T
(5
...
Some implementations of SPICE provide built-in functions to measure the average value
of a circuit signal
...
MEASURE TRAN I(VDD) AVG command
computes the area under a computed transient response I(VDD)) and divides it by the
(
period of interest
...
(5
...
Other implementations of SPICE are, unfortunately, not as extensive
...
A small circuit can
easily be conceived that acts as an integrator and whose output signal is nothing but the
average power
...
fm Page 187 Monday, September 6, 1999 11:41 AM
Section 5
...
34
...
The resistance R is only provided for DC-convergence reasons and should be chosen as high as possible to minimize leakage
...
The operation
of the circuit is summarized in Eq
...
50) under the assumption that the initial voltage on
the capacitor C is zero
...
50)
T
∫
k
P av = --- i DD dt
C
0
Equating Eq
...
49) and Eq
...
50) yields the necessary conditions for the equivalent
circuit parameters: k/C = VDD/T
...
VDD
+
iDD
–
Pav
Circuit
under test
C
k iDD
R
Figure 5
...
Example 5
...
4 is analyzed using the above
technique for a toggle period of 250 psec T = 250 psec, k = 1, VDD = 2
...
The resulting power consumption is plotted in Figure 5
...
3 µW
...
MEAS AVG command yields a value of
160
...
These numberes are equivalent to an energy of 39 fJ (which is close to the 37
...
10)
...
This is due to the
injection of current into the supply, when the output briefly overshoots DD as a result of the
V
capacitive coupling between input and output (as is apparent from in the transient response of
Figure 5
...
chapter5
...
8
x 10
Average Power
(over one cycle)
1
...
4
power (W)
1
...
8
0
...
4
0
...
5
1
1
...
5
-10
Figure 5
...
x 10
5
...
5, we have explored the impact of the scaling of technology on the some of
the important design parameters such as area, delay, and power
...
8)
...
3 Scaling scenarios for short-channel devices S and U represent the technology and voltage
(
scaling parameters, respectively)
...
fm Page 189 Monday, September 6, 1999 11:41 AM
Section 5
...
From , we can
derive that the gate delay indeed decreases
exponentially at a rate of 13%/year, or halving
every five years
...
3, since S averages
approximately 1
...
39
...
36 Scaling of the gate delay (from
is projected to be a few tens of picoseconds by
[Dally98])
...
Reducing power dissipation has only been a second-order priority until recently
...
An interesting chart is shown in Figure 5
...
Although the variation is
large— even for a fixed technology— it shows the power density to increase approximately
with S2
...
3
...
Even under
these circumstances, power dissipation-per-chip will continue to increase due to the everlarger die sizes
...
37 Evolution of power-density in
micro- and DSP processors, as a function of
the scaling factor S ([Sakurai])
...
S
The presented scaling model has one fatal flaw however: the performance and
power predictions produce purely “intrinsic” numbers that take only device parameters
into account
...
Similarly, charging and discharging the wire capacitances may dominate the energy bud-
chapter5
...
To get a crisper perspective, one has to construct a combined model that considers
device and wire scaling models simultaneously
...
4
...
We furthermore assume that the resistance of the driver dominates the
wire resistance, which is definitely the case for short to medium-long wires
...
4 Scaling scenarios for wire capacitance
...
εc represents the impact of fringing
and interwire capacitances
...
This impact is limited to an increase withc for short
ε
wires (S = SL), but it becomes increasingly more outspoken for medium-range and long
wires (SL < S)
...
38
...
38 Evolution of wire delay / gate delay ratio
with respect to technology (from [Fisher98])
...
The doomday scenario that interconnect may cause CMOS performance to saturate in the very near future hence may be exagerated
...
g
...
chapter5
...
7
Summary
191
5
...
The
key characteristics of the gate are summarized:
• The static CMOS inverter combines a pull-up PMOS section with a pull-down
NMOS device
...
• The gate has an almost ideal voltage-transfer characteristic
...
The noise margins
of a symmetrical inverter (where PMOS and NMOS transistor have equal currentdriving strength) approach VDD/2
...
• Its propagation delay is dominated by the time it takes to charge or discharge the
load capacitor CL
...
69 C L --------------------------
2
Keeping the load capacitance small is the most effective means of implementing
high-performance circuits
...
• The power dissipation is dominated by the dynamic power consumed in charging
and discharging the load capacitor
...
The dissipation is
proportional to the activity in the network The dissipation due to the direct-path
...
The static dissipation can usually be ignored but might become a major factor in the future as a result of subthreshold currents
...
The impact is even more striking if the supply
voltage is scaled simultaneously
...
5
...
Virtually every book on digital design devotes a substantial number of pages to the
analysis of the basic inverter gate
...
Some references of particular interest that were explicitly quoted in this chapter are
given below
...
fm Page 192 Monday, September 6, 1999 11:41 AM
192
THE CMOS INVERTER
Chapter 5
REFERENCES
[Baccarani84] G
...
Wordeman, and R
...
Electron Devices, ED-31(4):
p
...
[Brews89] J
...
, “The Submicrometer Silicon MOSFET,” in [Watts89]
...
, Computer Aided Design of Digital Integrated Circuits, Lecture Notes,
Katholieke Universiteit Leuven, Belgium
...
Dennard et al
...
256–258, 1974
...
Embabi, A
...
Elmasry,Digital BiCMOS Integrated Circuit Design,
Kluwer Academic Publishers, 1993
...
Hodges and H
...
[Jouppi93] N
...
, “A 300 MHz 115W 32b Bipolar ECL Microprocessor with On-Chip
Caches,” Proc
...
, pp
...
[Kakumu90] M
...
Kinugawa, “Power-Supply Voltage Impact on Circuit Performance for Half and Lower Submicrometer CMOS LSI,”IEEE Journal of Solid-State Circuits,
vol
...
8, pp
...
[Lohstroh81] J
...
69, pp
...
[Masaki92] A
...
18–24, November 1992
...
Schutz, “A 3
...
6 mm BiCMOS Superscaler Microprocessor,” ISSCC Digest of
Technical Papers, pp
...
[Sedra87] Sedra and Smith, MicroElectronic Circuits, Holt, Rinehart and Winston, 1987
...
, CMOS Digital Circuit Technology, Prentice Hall, 1988
...
Tang, “Scaling the Silicon Bipolar Transistor,” in [Watts89]
...
Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on
the Design of Buffer Circuits,” IEEE Journal of Solid-State Circuits, vol
...
4,
pp
...
[Watts89] Watts R
...
, SubMicron Integrated Circuits, Wiley, 1989
...
9 Exercises and Design Problems
For all problems, use the device parameters provided in Chapter (as well as the inside back cover),
3
unless otherwise mentioned
...
[M, SPICE, 3
...
2] The layout of a static CMOS inverter is given in Figure 5
...
(1 = 0
...
a
...
b
...
V
c
...
fm Page 176 Friday, January 18, 2002 9:01 AM
CHAPTER
5
THE CMOS INVERTER
Quantification of integrity, performance, and energy metrics of an inverter
Optimization of an inverter design
5
...
4
...
2
The Static CMOS Inverter — An Intuitive
Perspective
Propagation Delay: First-Order
Analysis
5
...
3
Propagation Delay from a Design
Perspective
5
...
5
Power, Energy, and Energy-Delay
5
...
1
Switching Threshold
5
...
1
Dynamic Power Consumption
5
...
2
Noise Margins
5
...
2
Static Consumption
Robustness Revisited
5
...
3
Putting It All Together
5
...
4
Analyzing Power Consumption Using
SPICE
5
...
3
5
...
4
...
6
Perspective: Technology Scaling and its
Impact on the Inverter Metrics
chapter5
...
1
5
...
Once its operation and properties are
clearly understood, designing more intricate structures such as NAND gates, adders, multipliers, and microprocessors is greatly simplified
...
The analysis of inverters can be extended to explain the behavior of more complex gates such as NAND, NOR, or XOR, which in turn form the building blocks for modules such as multipliers and processors
...
This is certainly the most popular
at present, and therefore deserves our special attention
...
While each of these parameters can be easily quantified for a given technology,
we also discuss how they are affected by scaling of the technology
...
5
...
1 shows the circuit diagram of a static CMOS inverter
...
25): the transistor is nothing more than a switch with an infinite offresistance (for |VGS| < |VT|), and a finite on-resistance (for |VGS| > |VT|)
...
1 Static CMOS inverter
...
chapter5
...
When Vin is high and equal to VDD, the NMOS
transistor is on, while the PMOS is off
...
2a
...
On the other hand, when the input voltage is low (0 V), NMOS and PMOS transistors
are off and on, respectively
...
2b shows that a path exists
between VDD and Vout, yielding a high output voltage
...
VDD
VDD
Rp
Vout
Vout
Rn
Vin = VDD
(a) Model for high input
Vin = 0
(b) Model for low input
Figure 5
...
Switch models of CMOS
A number of other important properties of static CMOS can be derived from this switchlevel view:
• The high and low output levels equal VDD and GND, respectively; in other words,
the voltage swing is equal to the supply voltage
...
• The logic levels are not dependent upon the relative device sizes, so that the transistors can be minimum size
...
This is in
contrast with ratioed logic, where logic levels are determined by the relative dimensions of the composing transistors
...
A well-designed CMOS inverter, therefore, has a low output impedance, which makes it less sensitive to noise and disturbances
...
• The input resistance of the CMOS inverter is extremely high, as the gate of an MOS
transistor is a virtually perfect insulator and draws no dc input current
...
A single inverter can theoretically drive an infinite number of
gates (or have an infinite fan-out) and still be functionally operational; however,
increasing the fan-out also increases the propagation delay, as will become clear
below
...
chapter5
...
2
The Static CMOS Inverter — An Intuitive Perspective
179
• No direct path exists between the supply and ground rails under steady-state operating conditions (this is, when the input and outputs remain constant)
...
SIDELINE: The above observation, while seemingly obvious, is of crucial importance,
and is one of the primary reasons CMOS is the digital technology of choice at present
...
All early microprocessors, such
as the Intel 4004, were implemented in a pure NMOS technology
...
The resulting static power
consumption puts a firm upper bound on the number of gates that can be integrated on a
single die; hence the forced move to CMOS in the 1980s, when scaling of the technology
allowed for higher integration densities
...
Such a graphical construction is traditionally called a load-line plot
...
We have selected the input voltage Vin, the output voltage Vout and the NMOS
drain current IDN as the variables of choice
...
1)
V GSn = V in ; V GSp = V in – V DD
V DSn = V out ; V DSp = V out – V DD
The load-line curves of the PMOS device are obtained by a mirroring around the xaxis and a horizontal shift over VDD
...
3, where the
subsequent steps to adjust the original PMOS I-V curves to the common coordinate set Vin,
Vout and IDn are illustrated
...
5
Vin = 1
...
5
Vin = VDD + VGSp
IDn = –IDp
Vout = VDD + VDSp
Figure 5
...
5 V)
...
fm Page 180 Friday, January 18, 2002 9:01 AM
180
THE CMOS INVERTER
Chapter 5
IDn
Vin = 0
Vin = 0
...
5
Vin = 2
Vin = 1
NMOS
Vin = 1
...
5
Vin = 1
Vin = 1
...
5
Vin = 2
...
4 Load curves for NMOS and PMOS transistors of the static CMOS inverter (VDD = 2
...
The dots
represent the dc operation points for various input voltages
...
4
...
Graphically, this
means that the dc points must be located at the intersection of corresponding load lines
...
5, 1, 1
...
5 V) are marked on the graph
...
The VTC of the inverter hence exhibits a very narrow transition zone
...
In that operation region, a small change in the input voltage
results in a large output variation
...
5
...
5
Vout
2
NMOS sat
PMOS res
1
1
...
5
NMOS res
PMOS sat
0
...
5
2
NMOS res
PMOS off
2
...
5 VTC of static CMOS inverter,
derived from Figure 5
...
5 V)
...
Before going into the analytical details of the operation of the CMOS inverter, a
qualitative analysis of the transient behavior of the gate is appropriate as well
...
fm Page 181 Friday, January 18, 2002 9:01 AM
Section 5
...
6 Switch model of
dynamic behavior of static CMOS
inverter
...
Assuming
temporarily that the transistors switch instantaneously, we can get an approximate idea of
the transient response by using the simplified switch model again (Figure 5
...
Let us consider the low-to-high transition first (Figure 5
...
The gate response time is simply determined by the time it takes to charge the capacitor CL through the resistor Rp
...
5, we learned that the propagation delay of such a network is proportional to the its time
constant RpCL
...
The latter is achieved by
increasing the W/L ratio of the device
...
6b), which is dominated by the RnCL time-constant
...
This complicates the exact determination of the propagation delay
...
4
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
In the qualitative discussion above, the overall shape of the voltage-transfer characteristic
of the static CMOS inverter was derived, as were the values of VOH and VOL (VDD and
GND, respectively)
...
5
...
1
Switching Threshold
The switching threshold, VM, is defined as the point where Vin = Vout
...
5)
...
An analytical expression for VM is obtained by equating the currents through the tran-
chapter5
...
We solve the case where the supply voltage is high so that the devices can be
assumed to be velocity-saturated (or VDSAT < VM - VT)
...
V DSATp
V DSATn
k n V DSATn V M – V Tn – ---------------- + k p V DSATp V M – V DD – V Tp – ---------------- = 0
2
2
(5
...
3)
k n V DSATn υ satn W n
1+r
assuming identical oxide thicknesses for PMOS and NMOS transistors
...
(5
...
4)
Eq
...
4) states that the switching threshold is set by the ratio r, which compares the relative driving strengths of the PMOS and NMOS transistors
...
This
requires r to be approximately 1, which is equivalent to sizing the PMOS device so that
(W/L)p = (W/L)n × (VDSATnk′n )/(VDSATnk′p )
...
Increasing the strength of the NMOS, on
the other hand, moves the switching threshold closer to GND
...
(5
...
When using this
expression, please make sure that the assumption that both devices are velocity-saturated
still holds for the chosen operation point
...
5)
Problem 5
...
The above expressions were derived under the assumption that the transistors are velocitysaturated
...
Under these circumstances,
Eq
...
6) holds for VM
...
V Tn + r ( V DD + V Tp )
V M = ----------------------------------------------- with r =
1+r
–k p
------kn
(5
...
fm Page 183 Friday, January 18, 2002 9:01 AM
Section 5
...
The required ratio is given by
Eq
...
5)
...
1 Switching threshold of CMOS inverter
We derive the sizes of PMOS and NMOS transistors such that the switching threshold of a
CMOS inverter, implemented in our generic 0
...
We use the process parameters presented in Example 3
...
5 V
...
5
...
(5
...
63 × ( 1
...
43 – 0
...
5
------------------------- --------- -----------------------------------------------------–
1
...
25 – 0
...
0 ⁄ 2 )
( W ⁄ L )n
30 × 10
Figure 5
...
The simulated PMOS/NMOS ratio of 3
...
25 V
switching threshold confirms the value predicted by Eq
...
5)
...
7 produces some interesting observations:
1
...
This means that small
variations of the ratio (e
...
, making it 3 or 2
...
It is therefore an accepted practice in industrial designs to set the
width of the PMOS transistor to values smaller than those required for exact symmetry
...
5, and 2 yields switching
thresholds of 1
...
18 V, and 1
...
1
...
7
1
...
5
1
...
4
1
...
1
1
0
...
8
10
0
10
W /W
p
n
1
Figure 5
...
25 µm
CMOS, VDD = 2
...
fm Page 184 Friday, January 18, 2002 9:01 AM
184
THE CMOS INVERTER
Vin
Chapter 5
Vmb
Vma
Vin
Vout
t
Vout
Vout
t
b) Response of inverter with
(a) Response of standard
modified threshold
inverter
Figure 5
...
t
2
...
Increasing the width of the PMOS or the NMOS moves VM towards VDD or GND
respectively
...
This is demonstrated by the example of
Figure 5
...
The incoming signal Vin has a very noisy zero value
...
8a)
...
8b)
...
Changing the switching threshold by a considerable amount is however not easy,
especially when the ratio of supply voltage to transistor threshold is relatively small
(2
...
4 = 6 for our particular example)
...
5 V requires a
transistor ratio of 11, and further increases are prohibitively expensive
...
7 is plotted in a semi-log format
...
3
...
In
d V in
the terminology of the analog circuit designer, these are the points where the gain g of the
amplifier, formed by the inverter, is equal to −1
...
A simpler approach is to use a piece wise linear approximation for the VTC, as
shown in Figure 5
...
The transition region is approximated by a straight line, the gain of
which equals the gain g at the switching threshold VM
...
The error introduced is small and well
chapter5
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
185
Vout
VOH
VM
Figure 5
...
Vin
VOL
VIL
VIH
within the range of what is required for an initial design
...
( V OH – V OL )
– V DD
V IH – V IL = – ------------------------------ = -----------g
g
VM
V IH = V M – -----g
V DD – V M
V IL = V M + ----------------------g
NM H = V DD – V IH
(5
...
In the extreme case of an infinite gain, the noise margins simplify to VOH VM and VM - VOL for NMH and NML, respectively, and span the complete voltage swing
...
We assume
once again that both PMOS and NMOS are velocity-saturated
...
4 that the gain is a strong function of the slopes of the currents in the saturation region
...
The gain can now be derived by differentiating the current
equation (5
...
V DSATn
k n V DSATn V in – V Tn – ---------------- ( 1 + λ n V out ) +
2
(5
...
9)
d V in
λ n k n V DSATn ( V in – V Tn – V DSATn ⁄ 2 ) + λ p k p V DSATp ( Vin – V DD – V Tp – V DSATp ⁄ 2 )
Ignoring some second-order terms, and setting Vin = VM results in the gain expression,
chapter5
...
10)
1+r
≈ ------------------------------------------------------------------------------( V M – V Tn – V DSATn ⁄ 2 ) ( λ n – λ p )
with ID(VM) the current flowing through the inverter for Vin = VM
...
It
can only in a minor way be influenced by the designer through the choice of supply and
switching threshold voltages
...
2 Voltage transfer characteristic and noise margins of CMOS Inverter
Assume an inverter in the generic 0
...
4 and with the NMOS transistor minimum size (W = 0
...
25 µm, W/L =
1
...
We first compute the gain at VM (= 1
...
5 × 115 × 10
–6
× 0
...
25 – 0
...
63 ⁄ 2 ) × ( 1 + 0
...
25 ) = 59 × 10
–6
–6
A
–6
1
g = – ---------------------- 1
...
63 + 1
...
4 × 30 × 10 × 1
...
5 (Eq
...
10)
-----------------------------------------------------------------------------------------------------------------------------–6
0
...
1
59 × 10
This yields the following values for VIL, VIH, NML, NMH:
VIL = 1
...
3 V, NML = NMH = 1
...
Figure 5
...
A close
to ideal characteristic is obtained
...
03 V and 1
...
03 V and 1
...
These values are lower than
those predicted for two reasons:
• Eq
...
10) overestimates the gain
...
10b, the maximum gain (at
VM) equals only 17
...
17 V, and 1
...
• The most important deviation is due to the piecewise linear approximation of the
VTC, which is optimistic with respect to the actual noise margins
...
To conclude this example, we also extracted from simulations the output resistance of
the inverter in the low- and high-output states
...
4 kΩ and 3
...
The output resistance is a good measure of the sensitivity of the gate
in respect to noise induced at the output, and is preferably as low as possible
...
This region
is very narrow however, as is apparent in the graph of Figure 5
...
It also receives poor
marks on other amplifier properties such as supply noise rejection
...
Where the analog designer would bias the amplifier in the middle of the transient
region, so that a maximum linearity is obtained, the digital designer will operate the
chapter5
...
3
Evaluating the Robustness of the CMOS Inverter: The Static Behavior
2
...
5
-10
1
-12
-14
0
...
5
1
1
...
5
-18
0
0
...
10
= 2
...
1
1
...
5
V (V)
(a)
in
(b)
Simulated Voltage Transfer Characteristic (a) and voltage gain (b) of CMOS inverter (0
...
Problem 5
...
5
...
3
Robustness Revisited
Device Variations
While we design a gate for nominal operation conditions and typical device parameters,
we should always be aware that the actual operating temperature might very over a large
range, and that the device parameters after fabrication probably will deviate from the nominal values we used in our design optimization process
...
This already became
apparent in Figure 5
...
To further confirm the assumed robustness of the gate, we have re-simulated the voltage transfer characteristic by replacing the
nominal devices by their worst- or best-case incarnations
...
11: a better-than-expected NMOS combined with an inferior PMOS, and the
opposite scenario
...
fm Page 188 Friday, January 18, 2002 9:01 AM
188
THE CMOS INVERTER
Chapter 5
2
...
5
V
out
Nominal
1
Good NMOS
Bad PMOS
0
...
5
1
1
...
5
Figure 5
...
The “good” device has a smaller oxide
thickness (- 3nm), a smaller length (-25 nm), a higher width
(+30 nm), and a smaller threshold (-60 mV)
...
V (V)
in
gate is by no means affected
...
Scaling the Supply Voltage
In Chapter 3, we observed that continuing technology scaling forces the supply voltages to
reduce at rates similar to the device dimensions
...
The reader probably wonders about the impact of this
trend on the integrity parameters of the CMOS inverter
...
(5
...
Plotting the (normalized) VTC for different supply voltages not only confirms this
conjecture, but even shows that the inverter is well and alive for supply voltages close to
the threshold voltage of the composing transistors (Figure 5
...
At a voltage of 0
...
5 V
...
• The dc-characteristic becomes increasingly sensitive to variations in the device
parameters such as the transistor threshold, once supply voltages and intrinsic voltages become comparable
...
While this typically
helps to reduce the internal noise in the system (such as caused by crosstalk), it
makes the design more sensitive to external noise sources that do not scale
...
fm Page 189 Friday, January 18, 2002 9:01 AM
Section 5
...
5
189
0
...
15
V out (V)
V out (V)
1
...
1
0
...
5
0
0
gain = -1
0
...
5
2
2
...
Figure 5
...
05
0
...
15
0
...
...
25 µm CMOS technology)
...
12b the voltage transfer characteristic of the same inverter for the
even-lower supply voltages of 200 mV, 100 mV, and 50 mV (while keeping the transistor
thresholds at the same level)
...
The sub-threshold
currents are sufficient to switch the gate between low and high levels, and provide enough
gain to produce acceptable VTCs
...
At around 100 mV, we start observing a major deterioration of the gate characteristic
...
The latter turns out to be a fundamental show-stopper
...
It turns out that below this same voltage, thermal noise becomes an issue as
well, potentially resulting in unreliable operation
...
11)
Eq
...
11) presents a true lower bound on supply scaling
...
Problem 5
...
(3
...
Derive an expression for the gain of the inverter under these circumstances
chapter5
...
The resulting expression demonstrates that the minimum voltage is a function of the slope
factor n of the transistor
...
12)
According to this expression, the gain drops to -1 at VDD = 48 mV (for n = 1
...
5
...
This observation suggests
that getting CL as small as possible is crucial to the realization of high-performance
CMOS circuits
...
In addition to this detailed analysis, the section also presents a summary of techniques that a designer might use to optimize the performance of the inverter
...
4
...
To make the analysis tractable, we assume that all capacitances are lumped
together into one single capacitor CL , located between Vout and GND
...
VDD
VDD
M2
Cg4
Cdb2
Vin
Cgd12
Vout
Cdb1
M1
Figure 5
...
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
191
Figure 5
...
It includes all the
capacitances influencing the transient response of node Vout
...
Accounting
only for capacitances connected to the output node, CL breaks down into the following
components
...
Under these circumstances, the only contributions to Cgd12
are the overlap capacitances of both M1 and M2
...
The lumped capacitor model now requires that this floating gate-drain capacitor be
replaced by a capacitance-to-ground
...
During a low-high or high-low transition, the terminals of the gatedrain capacitor are moving in opposite directions (Figure 5
...
The voltage change over
the floating capacitor is hence twice the actual output voltage swing
...
We use the following equation for the gate-drain capacitors: Cgd = 2 CGD0W (with
CGD0 the overlap capacitance per unit width as used in the SPICE model)
...
57)
...
14 The Miller effect—A capacitor experiencing identical but opposite voltage swings at both
its terminals can be replaced by a capacitor to ground, whose value is two times the original value
...
Such a
capacitor is, unfortunately, quite nonlinear and depends heavily on the applied voltage
...
A multiplication factor Keq is introduced to relate the linearized
capacitor to the value of the junction capacitance under zero-bias conditions
...
In a digital
inverter, the large scale gain between input and output always equals -1
...
fm Page 192 Friday, January 18, 2002 9:01 AM
192
THE CMOS INVERTER
C eq = K eq C j0
Chapter 5
(5
...
An expression
for Keq was derived in Eq
...
11) and is repeated here for convenience
m
–φ0
K eq = --------------------------------------------------- [ ( φ 0 – V high ) 1 – m – ( φ 0 – V low ) 1 – m ]
( V high – V low ) ( 1 – m )
(5
...
Observe that the junction voltage is defined to be negative for reverse-biased junctions
...
3 Keq for a 2
...
13 designed in the generic 0
...
The
relevant capacitance parameters for this process were summarized in Table 3
...
Let us first analyze the NMOS transistor (Cdb1 in Figure 5
...
The propagation delay
is defined by the time between the 50% transitions of the input and the output
...
25 V, as the output voltage swing goes
from rail to rail or equals 2
...
We, therefore, linearize the junction capacitance over the
interval {2
...
25 V} for the high-to-low transition, and {0, 1
...
During the high-to-low transition at the output, Vout initially equals 2
...
Because the
bulk of the NMOS device is connected to GND, this translates into a reverse voltage of 2
...
5 V
...
25 V or Vlow = −1
...
Evaluating Eq
...
14) for the bottom plate and sidewall components of the diffusion capacitance yields
Bottom plate: Keq (m = 0
...
9) = 0
...
44, φ0 = 0
...
61
During the low-to-high transition, Vlow and Vhigh equal 0 V and −1
...
5, φ0 = 0
...
79,
Sidewall: Keqsw (m = 0
...
9) = 0
...
5 V
...
25 V),
Bottom plate: Keq (m = 0
...
9) = 0
...
32, φ0 = 0
...
86
and for the low-to-high transition (Vlow = −1
...
5 V)
Bottom plate: Keq (m = 0
...
9) = 0
...
32, φ0 = 0
...
7
Using this approach, the junction capacitance can be replaced by a linear component
and treated as any other device capacitance
...
The logic delays are not significantly influenced by this
simplification
...
fm Page 193 Friday, January 18, 2002 9:01 AM
Section 5
...
As argued in Chapter 4, this component is growing in importance with the
scaling of the technology
...
Hence,
C fanout = C gate ( NMOS ) + C gate ( PMOS )
= ( C GSOn + C GDOn + W n L n C ox ) + ( C GSOp + C GDOp + W p L p C ox )
(5
...
This
has a relatively minor effect on the accuracy, since we can safely assume that the
connecting gate does not switch before the 50% point is reached, and Vout2, therefore, remains constant in the interval of interest
...
This is not exactly the case as we discovered in
Chapter 3
...
A drop in overall gate capacitance also occurs just before the transistor turns on (Figure 3
...
During the first half of the transient, it may be assumed
that one of the load devices is always in linear mode, while the other transistor
evolves from the off-mode to saturation
...
Example 5
...
25 µm CMOS Inverter
A minimum-size, symmetrical CMOS inverter has been designed in the 0
...
The layout is shown in Figure 5
...
The supply voltage VDD is set to 2
...
From the
layout, we derive the transistor sizes, diffusion areas, and perimeters
...
1
...
The drain area is formed by the metal-diffusion contact, which has an area of 4 × 4 λ2,
and the rectangle between contact and gate, which has an area of 3 × 1 λ2
...
30 µm2 (as λ = 0
...
The perimeter of the drain area is rather
involved and consists of the following components (going counterclockwise): 5 + 4 + 4 + 1 +
1 = 15 λ or PD = 15 × 0
...
875 µm
...
The drain area and perimeter of the
PMOS transistor are derived similarly (the rectangular shape makes the exercise considerably
simpler): AD = 5 × 9 λ2 = 45 λ2, or 0
...
375 µm
...
fm Page 194 Friday, January 18, 2002 9:01 AM
194
THE CMOS INVERTER
Chapter 5
VDD
PMOS
(9λ/2λ)
0
...
15 Layout of two chained, minimum-size inverters using SCMOS Design Rules (see also
Color-plate 6)
...
1
Inverter transistor data
...
375/0
...
125/0
...
3 (19 λ )
1
...
3 (19 λ )
1
...
7 (45 λ2)
2
...
7 (45 λ2)
2
...
The capacitor parameters for our generic process were
summarized in Table 3
...
31 fF/µm; CGDO(PMOS) = 0
...
9 fF/µm2
Side-wall junction capacitance: CJSW(NMOS) = 0
...
22
fF/µm
Gate capacitance: Cox(NMOS) = Cox(PMOS) = 6 fF/µm2
Finally, we should also consider the capacitance contributed by the wire, connecting
the gates and implemented in metal 1 and polysilicon
...
fm Page 195 Friday, January 18, 2002 9:01 AM
Section 5
...
Inspection of the layout helps us
to form a first-order estimate and yields that the metal-1 and polysilicon areas of the wire, that
are not over active diffusion, equal 42 λ2 and 72 λ2, respectively
...
2, we find the wire capacitance — observe that we ignore the
fringing capacitance in this simple exercise
...
Cwire = 42/82 µm2 × 30 aF/µm2 + 72/82 µm2 × 88 aF/µm2 = 0
...
2
...
3 for the computation of the diffusion capacitances
...
Table 5
...
Capacitor
Expression
Value (fF) (H→L)
Value (fF) (L→H)
Cgd1
2 CGD0n Wn
0
...
23
Cgd2
2 CGD0p Wp
0
...
61
Cdb1
Keqn ADn CJ + Keqswn PDn CJSW
0
...
90
Cdb2
1
...
15
(CGD0n+CGSOn) Wn + Cox Wn Ln
0
...
76
Cg4
(CGD0p+CGSOp) Wp + Cox Wp Lp
2
...
28
Cw
From Extraction
0
...
12
CL
5
...
2
Keqp ADp CJ + Keqswp PDp CJSW)
Cg3
∑
6
...
0
Propagation Delay: First-Order Analysis
One way to compute the propagation delay of the inverter is to integrate the capacitor
(dis)charge current
...
(5
...
v2
tp =
CL ( v )
∫ -------------- dv
i(v)
(5
...
An exact computation of this equation is intractable, as both CL(v) and
i(v) are nonlinear functions of v
...
6 to derive a reasonable approximation of the propagation
delay adequate for manual analysis
...
The preceding section derived precisely this value
chapter5
...
An expression for the average on-resistance of the MOS transistor was already derived in Example 3
...
VD D
1
R eq = ---------------VDD ⁄ 2
∫
V DD ⁄ 2
V
3 VDD
7
---------------------------------- dV ≈ -- ------------ 1 – -- λV DD
4 I DSAT
9
I DSAT ( 1 + λV )
(5
...
5
...
Hence,
t pHL = ln(2)R eqn C L = 0
...
18)
Similarly, we can obtain the propagation delay for the low-to-high transition,
t pLH = 0
...
19)
with Reqp the equivalent on-resistance of the PMOS transistor over the interval of interest
...
This has been shown to be approximately the case in
the example of the previous section
...
69C L --------------------------
2
2
(5
...
This condition can be achieved by making the on-resistance of the
NMOS and PMOS approximately equal
...
Example 5
...
25 µm CMOS Inverter
To derive the propagation delays of the CMOS inverter of Figure 5
...
(5
...
(5
...
The load capacitance CL was already computed in Example 5
...
25 µm CMOS process were
derived in Table 3
...
For a supply voltage of 2
...
From the layout, we determine
the (W/L) ratios of the transistors to be 1
...
5 for the PMOS
...
This leads to the following values for the delays:
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
197
3
Vin
2
...
5
tpHL
1
tpLH
0
...
16 Simulated transient
response of the inverter of Figure
5
...
0
-0
...
5
1
1
...
5
t (sec)
x 10
-10
13kΩ
t pHL = 0
...
1fF = 36 psec
1
...
69 × ------------- × 6
...
5
and
tp = 36 + 29 = 32
...
15
...
16, and determines the propagation delays to be
39
...
7 for the HL and LH transitions, respectively
...
Notice especially the
overshoots on the simulated output signals
...
These overshoots clearly have a negative impact on the performance of the gate, and explain why the
simulated delays are larger than the estimations
...
This is not necessarily the case
...
The purpose of
the manual analysis is to get a basic insight in the behavior of the circuit and to determine
the dominant parameters
...
Consider the example above a stroke of good luck
...
fm Page 198 Friday, January 18, 2002 9:01 AM
198
THE CMOS INVERTER
Chapter 5
The obvious question a designer asks herself at this point is how she can manipulate
and/or optimize the delay of a gate
...
Combining Eq
...
18) and Eq
...
17), and assuming for the time being that the channel-length modulation factor λ is ignorable, yields the following expression for tpHL (a
similar analysis holds for tpLH)
C L V DD
3 C L V DD
t pHL = 0
...
52 -------------------------------------------------------------------------------------------------------( W ⁄ L ) n k′ n V DSATn ( V DD – V Tn – V DSATn ⁄ 2 )
4 I DSATn
(5
...
Under these conditions, the delay becomes virtually independent of the supply
voltage (Eq
...
22))
...
CL
t pHL ≈ 0
...
22)
This analysis is confirmed in Figure 5
...
It comes as no surprise that this curve is virtually identical in shape to the one of Figure 3
...
While the delay is relative insensitive to supply variations for higher values of VDD, a sharp increase can be observed starting around
5
...
5
t p (normalized)
4
3
...
5
2
1
...
8
1
1
...
4
1
...
8
2
2
...
4
Figure 5
...
5
V)
...
(5
...
Observe that this
equation is only valid when the devices are
velocity-saturated
...
(V)
DD
≈2VT
...
Design Techniques
From the above, we deduce that the propagation delay of a gate can be minimized in the following ways:
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
199
• Reduce CL
...
Careful layout helps to reduce the diffusion and interconnect capacitances
...
• Increase the W/L ratio of the transistors
...
Proceed however with caution
when applying this approach
...
In fact, once the intrinsic capacitance (i
...
the diffusion capacitance) starts to dominate the extrinsic load formed by wiring and fanout, increasing the
gate size does not longer help in reducing the delay, and only makes the gate larger in
area
...
In addition, wide transistors have a larger gate
capacitance, which increases the fan-out factor of the driving gate and adversely affects
its speed
...
As illustrated in Figure 5
...
This flexibility allows the designer to trade-off energy dissipation for performance, as we will see in a later section
...
Also, reliability concerns (oxide breakdown, hot-electron effects)
enforce firm upper-bounds on the supply voltage in deep sub-micron processes
...
4
Propagation Delay as a Function of (dis)charge Current
So far, we have expressed the propagation delay as a function of the equivalent resistance of
the transistors
...
Derive an expression of the propagation delay using this alternative approach
...
4
...
Most importantly, they lead to a general approach
towards transistor sizing that will prove to be extremely useful
...
This typically requires a ratio of 3 to 3
...
The motivation behind this approach is to create an inverter
with a symmetrical VTC, and to equate the high-to-low and low-to-high propagation
delays
...
If symmetry and reduced noise margins are not of prime concern, it is actually possible to speed up the inverter by reducing the width of the PMOS device!
chapter5
...
When two contradictory effects are present, there must exist
a transistor ratio that optimizes the propagation delay of the inverter
...
Consider
two identical, cascaded CMOS inverters
...
23)
where Cdp1 and Cdn1 are the equivalent drain diffusion capacitances of PMOS and NMOS
transistors of the first inverter, while Cgp2 and Cgn2 are the gate capacitances of the second
gate
...
When the PMOS devices are made β times larger than the NMOS ones (β = (W/L)p /
(W/L)n), all transistor capacitances will scale in approximately the same way, or Cdp1 ≈ β
Cdn1, and Cgp2 ≈ β Cgn2
...
(5
...
24)
An expression for the propagation delay can be derived, based on Eq
...
20)
...
69
t p = --------- ( ( 1 + β ) ( C dn1 + C gn2 ) + C W ) R eqn + ---------
β
2
r
= 0
...
25)
r (= Reqp/Reqn) represents the resistance ratio of identically-sized PMOS and NMOS transistors
...
26)
This means that when the wiring capacitance is negligible (Cdn1+Cgn2 >> CW), βopt
equals r , in contrast to the factor r normally used in the noncascaded case
...
The surprising result of this
analysis is that smaller device sizes (and hence smaller design area) yield a faster design at
the expense of symmetry and noise margin
...
6 Sizing of CMOS Inverter Loaded by an Identical Gate
Consider again our standard design example
...
3), we find that a ratio β of 2
...
Eq
...
26) now predicts that the device ratio for an optimal performance
should equal 1
...
These results are verified in Figure 5
...
The graph clearly illustrates how a changing
β trades off between tpLH and tpHL
...
9, which is some-
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
201
what higher than predicted
...
4
...
5
3
...
5
2
2
...
5
4
4
...
18 Propagation delay of CMOS inverter as a
function of the PMOS/NMOS transistor ratio β
...
The load capacitance of the
inverter can be divided into an intrinsic and an extrinsic component, or CL = Cint + Cext
...
Cext is the extrinsic load capacitance, attributable
to fanout and wiring capacitance
...
69R eq ( C int + C ext )
= 0
...
27)
tp0 = 0
...
The next question is how transistor sizing impacts the performance of the gate
...
(5
...
The intrinsic capacitance Cint consists of the
diffusion and Miller capacitances, both of which are proportional to the width of the transistors
...
The resistance of the gate relates to the reference gate as Req =
Rref/S
...
(5
...
69 ( R ref ⁄ S ) ( SC iref ) ( 1 + C ext ⁄ ( SC iref ) )
C ext
C ext
= 0
...
28)
chapter5
...
When no load is present, an
increase in the drive of the gate is totally offset by the increased capacitance
...
Yet, any sizing factor S that is sufficiently larger than (Cext/Cint) produces similar
results at a substantial gain in silicon area
...
7 Device Sizing for Performance
Let us explore the performance improvement that can be obtained by device sizing in the
design of Example 5
...
We find from Table 5
...
05 (Cint = 3
...
15
fF)
...
05
...
This is confirmed by simulation results, which predict a maximum obtainable perfor3
...
6
3
...
2
3
2
...
6
2
...
2
2
2
4
6
8
S
10
12
14
Figure 5
...
15)
...
9 (tp0 = 19
...
From the graph of Figure 5
...
Sizing A Chain of Inverters
While sizing up an inverter reduces its delay, it also increases its input capacitance
...
Therefore, a more relevant problem is determining the optimum sizing of a gate when embedded in a real environment
...
To determine the input loading effect, the
relationship between the input gate capacitance Cg and the intrinsic output capacitance of
the inverter has to be established
...
Hence, the following relationship holds, independent of gate sizing
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
203
C int = γC g
(5
...
Rewriting Eq
...
28),
C ext
t p = t p0 1 + --------- = t p0 ( 1 + f ⁄ γ )
γC g
(5
...
This ratio is called the effective fanout f
...
Figure 5
...
The goal is to minimize the delay
through the inverter chain, with the input capacitance of the first inverter Cg1—typically a
minimally-sized device— and the load capacitance CL fixed
...
20 Chain of N inverters with fixed
input and output capacitance
...
31)
we can derive the total delay of the chain
...
32)
This equation has N-1 unknowns, being Cg,2, Cg,3, …, Cg,N
...
The
result is a set of constraints, Cg,j+1/Cg,j = Cg,j/Cg,j-1
...
(5
...
With Cg,1
and CL given, we can derive the sizing factor,
C L ⁄ C g, 1 =
N
F
(5
...
(5
...
chapter5
...
Observe how the
relationship between tp and F is a very strong function of the number of stages
...
Introducing a second
stage turns it into square root, and so on
...
Choosing the Right Number of Stages in an Inverter Chain
Evaluation of Eq
...
35) reveals the trade-off’s in choosing the number of stages for a
given F (=fN)
...
If the number of stages is
too small, the effective fanout of each stage becomes large, and the second component is
dominant
...
N F ln F
γ + N F – ----------------- = 0
N
or equivalently
f = e
(5
...
Under these simplified conditions, it is found that the optimal number of stages equals N = ln(F), and the effective
fanout of each stage is set to f = 2
...
This optimal buffer design scales consecutive
stages in an exponential fashion, and is hence called an exponential horn [Mead79]
...
(5
...
The results are plotted
in Figure 5
...
For the typical case of γ≈1, the optimum scaler factor turns out to be close
to 3
...
Figure 5
...
Choosing values of the fanout that are higher
than the optimum does not impact the delay that much, and reduces the required number
of buffer stages and the implementation area
...
The use of too many stages (f < fopt), on the other hand, has a substantial negative impact on the delay, and should be avoided
...
8 The Impact of Introducing Buffer Stages
Table 5
...
Observe the impressive
speed-up obtained with cascaded inverters when driving very large capacitive loads
...
22
...
chapter5
...
4
Performance of CMOS Inverter: The Dynamic Behavior
5
205
7
6
4
...
5
4
3
2
3
1
2
...
5
1
1
...
5
0
3
1
1
...
5
3
3
...
5
5
f
(b) Normalized propagation delay (tp/(tpopt)
as a function of the effective fanout f for γ=1
...
Figure 5
...
Table 5
...
F
Unbuffered
Two Stage
Inverter Chain
10
11
8
...
3
100
101
22
16
...
8
10,000
10,001
202
33
...
5 Sizing an Inverter Network
Determine the sizes of the inverters in the circuit of Figure 5
...
You may assume that CL = 64 Cg,1
...
22 Inverter network, in which each
gate has a fanout of 4 gates, distributing a single
input to 16 output signals in a tree-like fashion
...
fm Page 206 Friday, January 18, 2002 9:01 AM
206
THE CMOS INVERTER
Chapter 5
Hints: Determine first the ratio’s between the devices that minimize the delay
...
52Cg,2 = 6
...
Straightforward sizing of the inverter chain, without taking the fanout into account,
would have led to a sizing factor of 4 instead of 2
...
The rise/fall time of the input signal
All the above expressions were derived under the assumption that the input signal to the
inverter abruptly changed from 0 to VDD or vice-versa
...
In reality, the input signal changes gradually
and, temporarily, PMOS and NMOS transistors conduct simultaneously
...
Figure 5
...
It can be observed that tp increases (approximately) linearly with increasing input slope, once ts > tp(ts=0)
...
4
x 10
-11
5
...
8
4
...
4
4
...
8
3
...
23 tp as a function of the
input signal slope (10-90% rise or
fall time) for minimum-size
inverter with fan-out of a single
gate
...
From a design perspective, it is more valuable to relate the impact of the
finite slope on the performance directly to its cause, which is the limited driving capability
of the preceding gate
...
The
strength of this approach is that it realizes that a gate is never designed in isolation, and
that its performance is both affected by the fanout, and the driving strength of the gate(s)
feeding into its inputs
...
fm Page 207 Friday, January 18, 2002 9:01 AM
Section 5
...
37)
Eq
...
37) states that the propagation delay of inverter i equals the sum of the delay of the
i
same gate for a step input (tstep ) (i
...
zero input slope) augmented with a fraction of the
step-input delay of the preceding gate (i-1)
...
25
...
Example 5
...
22
...
(5
...
(5
...
4C g, 3
4C g, 2
t p, 2 = t p0 1 + ------------- + ηt p0 1 + -------------
γC g, 2
γC g, 1
An analysis of the overall propagation delay in the style of Problem 5
...
47 (assuming η = 0
...
Design Challenge
It is advantageous to keep the signal rise times smaller than or equal to the gate propagation
delays
...
Keeping the rise and fall times of the signals small and of
approximately equal values is one of the major challenges in high-performance design, and is
often called ‘slope engineering’
...
6 Impact of input slope
Determine if reducing the supply voltage increases or decreases the influence of the input
signal slope on the propagation delay
...
Delay in the Presence of (Long) Interconnect Wires
The interconnect wire has played a minimal role in our analysis so far
...
Earlier delay expressions can be adjusted to accommodate these extra contributions by employing the wire modeling techniques introduced in
chapter5
...
The analysis detailed in Example 4
...
Consider the circuit of Figure 5
...
The driver is represented by a single resistance Rdr,
which is the average between Reqn and Reqp
...
(rw,cw,L)
Vout
Cint
Vout
Cfan
Figure 5
...
The propagation delay of the circuit can be obtained by applying the Ellmore delay
expression
...
69R dr C int + ( 0
...
38R w )C w + 0
...
69R dr ( C int + C fan ) + 0
...
38r w c w L
2
(5
...
38 factor accounts for the fact that the wire represents a distributed delay
...
The delay expressions contains a component that is linear with the wire length, as well a quadratic one
...
Example 5
...
24, and assume the device parameters of Example 5
...
5(13/1
...
5) = 7
...
The wire is implemented in metal1
and has a width of 0
...
This yields the following parameters: cw =
92 aF/µm, and rw = 0
...
4)
...
(5
...
Solving the following quadratic equation yields a single (meaningful) solution
...
6 × 10
– 18 2
L + 0
...
29 × 10
– 12
or
L = 65 µm
Observe that the extra delay is solely due to the linear factor in the equation, and more specifically due to the extra capacitance introduced by the wire
...
This is
due to the high resistance of the (minimum-size) driver transistors
...
Analyze, for instance, the same problem with the
driver transistors 100 times wider, as is typical for high-speed, large fan-out drivers
...
fm Page 209 Friday, January 18, 2002 9:01 AM
Section 5
...
5
Power, Energy, and Energy-Delay
209
Power, Energy, and Energy-Delay
So far, we have seen that the static CMOS inverter with its almost ideal VTC—symmetrical shape, full logic swing, and high noise margins—offers a superior robustness, which
simplifies the design process considerably and opens the door for design automation
...
It is this combination of robustness and low
static power that has made static CMOS the technology of choice of most contemporary
digital designs
...
5
...
1
Dynamic Power Consumption
Dynamic Dissipation due to Charging and Discharging Capacitances
Each time the capacitor CL gets charged through the PMOS transistor, its voltage rises
from 0 to VDD, and a certain amount of energy is drawn from the power supply
...
During the high-to-low transition, this capacitor is discharged, and the stored energy
is dissipated in the NMOS transistor
...
Let us first consider the low-tohigh transition
...
Therefore, the equivalent circuit
of Figure 5
...
The values of the energy
EVDD, taken from the supply during the transition, as
CL
well as the energy EC, stored on the capacitor at the
end of the transition, can be derived by integrating
Figure 5
...
during the low-to-high transition
...
26
...
39)
0
3
Observe that this model is a simplification of the actual circuit
...
The latter experience a charge-discharge cycle that is out of phase with the capacitances to GND,
i
...
they get charged when Vout goes low and discharged when Vout rises
...
chapter5
...
40)
0
vout
E C = i VDD ( t )v out dt =
dv out
C L ----------- v out dt = C L
dt
Chapter 5
iVDD
t
Charge
Discharge
t
Figure 5
...
These results can also be derived by observing that during the low-to-high transition, CL is loaded with a charge CLVDD
...
The energy stored on the capacitor equals CLVDD2/2
...
The
other half has been dissipated by the PMOS transistor
...
Once again, there is no dependence on the size of the device
...
In order to compute the power consumption, we have to take
into account how often the device is switched
...
41)
f0→1 represents the frequency of energy-consuming transitions, this is 0 → 1 transitions
for static CMOS
...
At
the same time, the total capacitance on the chip (CL) increases as more and more gates are
placed on a single die
...
25 µm CMOS chip with a clock rate of
500 Mhz and an average load capacitance of 15 fF/gate, assuming a fanout of 4
...
5 V supply then equals approximately 50 µW
...
In reality, not all gates in the complete IC switch at the full rate of
500 Mhz
...
Example 5
...
4 is now easily computed
...
2, the value of the load capacitance was determined to equal 6 fF
...
5 V, the amount of energy needed to charge and discharge that capacitance equals
chapter5
...
5
Power, Energy, and Energy-Delay
211
2
E dyn = CL VDD = 37
...
For a tp of 32
...
5), we find that the dynamic power dissipation of the
circuit is
P dyn = Edyn ⁄ ( 2t p ) = 580 µW
Of course, an inverter in an actual circuit is rarely switched at this maximum rate, and
even if done so, the output does not swing from rail-to-rail
...
For a rate of 4 GHz (T = 250 psec), the dissipation reduces to 150 µW
...
Computing the dissipation of a complex circuit is complicated by the f0→1 factor,
also called the switching activity
...
One concern is that the switching activity of a network is a function of the nature and the
statistics of the input signals: If the input signals remain unchanged, no switching happens, and the dynamic power consumption is zero! On the other hand, rapidly changing
signals provoke plenty of switching and hence dissipation
...
We can
accommodate this by another rewrite of the equation, or
2
2
2
P dyn = C L V DD f 0 → 1 = C L V DD P 0 → 1 f = C EFF V DD f
(5
...
CEFF = P0→1CL is called the effective capacitance
and represents the average capacitance switched every clock cycle
...
1) reduces the average consumption to 5 W
...
12 Switching activity
Consider the waveforms on the
right where the upper waveform
represents the idealized clock signal, and the bottom one shows the
signal at the output of the gate
...
25 (or 25%)
...
27 Clock and signal waveforms
Low Energy/Power Design Techniques
With the increasing complexity of the digital integrated circuits, it is anticipated that the power
problem will only worsen in future technologies
...
fm Page 212 Friday, January 18, 2002 9:01 AM
212
THE CMOS INVERTER
Chapter 5
voltages are becoming more and more attractive
...
For instance, reducing VDD from 2
...
25 V for our example drops the power dissipation from 5 W to 1
...
This assumes that the same clock rate can be sustained
...
17
demonstrates that this assumption is not that unrealistic as long as the supply voltage is substantially higher than the threshold voltage
...
When a lower bound on the supply voltage is set by external constraints (as often happens in real-world designs), or when the performance degradation due to lowering the supply
voltage is intolerable, the only means of reducing the dissipation is by lowering the effective
capacitance
...
A reduction in the switching activity can only be accomplished at the logic and architectural abstraction levels, and will be discussed in more detail in later Chapters
...
As most of the capacitance in a combinational logic circuit is due to transistor capacitances (gate and diffusion), it makes sense to keep those contributions to a
minimum when designing for low power
...
This definitely affects the performance of the circuit, but
the effect can be offset by using logic or architectural speed-up techniques
...
This is contrary to common design practices used in cell libraries, where transistors are generally made large to accommodate a range
of loading and performance requirements
...
Assume we have to minimize the energy dissipation of a circuit with a specified lower-bound on the performance
...
Yet, the latter causes the capacitance to
increase
...
Example 5
...
To take the
Cg1
f
1
input loading effects into account, we
assume that the inverter itself is driven by a
Figure 5
...
28)
...
goal is to minimize the energy dissipation of
the complete circuit, while maintaining a
lower-bound on performance
...
The propagation delay of the optimized circuit should
not be larger than that of a reference circuit, chosen to have as parameters f = 1 and Vdd = Vref
...
4
...
fm Page 213 Friday, January 18, 2002 9:01 AM
Section 5
...
43)
with F = (Cext/Cg1) the overall effective fanout of the circuit tp0 is the intrinsic delay of the
inverter
...
(5
...
V DD
t p0 ∼ -----------------------V DD – V TE
(5
...
45)
The performance constraint now states that the propagation delay of the scaled circuit should
be equal (or smaller) to the delay of the reference circuit (f=1, Vdd = Vref)
...
Hence,
F
F
t p0 2 + f + --
2 + f + --
tp
V ref – V TE
f
f
V DD ------------------------ -------------------- = 1
-------- = -------------------------------- = ---------
-
t pref
t p0ref ( 3 + F )
V ref V DD – V TE
3+F
(5
...
(5
...
29a for different values of F
...
Increasing the
size of the inverter from the minimum initially increases the performance, and hence allows
for a lowering of the supply voltage
...
Further increases in the device sizes only increase the self-loading factor, deteriorate the performance, and require an increase in supply voltage
...
4
1
...
5
1
F=1
3
normalized energy
2
vdd (V)
2
...
5
10
1
2
1
5
0
...
5
20
0
1
2
3
4
5
6
7
0
1
2
3
4
f
6
7
f
(a)
5
(b)
Figure 5
...
(a) Required supply voltage as a function of the sizing factor f
for different values of the overall effective fanout F; (b) Energy of scaled circuit (normalized with respect to the reference
case) as a function of f
...
5V, VTE = 0
...
chapter5
...
V DD 2 2 + 2f + F
E-------- = --------- -----------------------
V ref 4 + F
E ref
(5
...
A graphical approach is just as effective
...
29b, from which a number of conclusions can be drawn:
• Device sizing, combined with supply voltage reduction, is a very effective approach in
reducing the energy consumption of a logic network
...
But the gain is also sizable for smaller values of F
...
• Oversizing the transistors beyond the optimal value comes at a hefty price in energy
...
• The optimal sizing factor for energy is smaller than the one for performance, especially for
large values of F
...
53, while fopt(performance) = 4
...
Increasing the device sizes only leads to a minimal supply reduction once
VDD starts approaching VTE, hence leading to very minimal energy gains
...
The finite slope of the input signal causes a direct current path between VDD
and GND for a short period of time during switching, while the NMOS and the PMOS
transistors are conducting simultaneously
...
30
...
48)
as well as the average power consumption
2
P dp = t sc VDD I peak f = C sc VDD f
(5
...
tsc represents the time both devices are conducting
...
(5
...
chapter5
...
5
Power, Energy, and Energy-Delay
215
VDD
VDD – VT
vin
Vin
Isc
VT
Vout
CL
t
Ipeak
ishort
t
Figure 5
...
V DD – 2V T
V DD – 2V T t r ( f
t sc = ------------------------- t s ≈ ------------------------- × -------)
0
...
50)
Ipeak is determined by the saturation current of the devices and is hence directly proportional to the sizes of the transistors
...
This relationship is best illustrated by the following simple analysis: Consider a static CMOS inverter with a 0 → 1 transition at the input
...
31a)
...
31 Impact of load capacitance on short-circuit current
...
As the source-drain
voltage of the PMOS device is approximately 0 during that period, the device shuts off
without ever delivering any current
...
Consider now the reverse case, where the output capacitance is very small, and the output
fall time is substantially smaller than the input rise time (Figure 5
...
The drain-source
voltage of the PMOS device equals VDD for most of the transition period, guaranteeing the
maximal short-circuit current (equal to the saturation current of the PMOS)
...
fm Page 216 Friday, January 18, 2002 9:01 AM
216
THE CMOS INVERTER
Chapter 5
represents the worst-case condition
...
32, which plots the short-circuit current through the NMOS transistor during a
low-to-high transition as a function of the load capacitance
...
5
x 10
-4
CL = 20 fF
2
CL = 100 fF
1
I
sc
(A)
1
...
5
Figure 5
...
0
-0
...
On the other hand,
making the output rise/fall time too large slows down the circuit and can cause short-circuit currents in the fan-out gates
...
Design Techniques
A more practical rule, which optimizes the power consumption in a global way, can be formulated (Veendrick84]):
The power dissipation due to short-circuit currents is minimized by matching the rise/fall
times of the input and output signals
...
Making the input and output rise times of a gate identical is not the optimum solution for
that particular gate on its own, but keeps the overall short-circuit current within bounds
...
33, which plots the short-circuit energy dissipation of an inverter (normalized with respect to the zero-input rise time dissipation) as a function of the ratio r between
input and output rise/fall times
...
For very large
capacitance values, all power dissipation is devoted to charging and discharging the load
capacitance
...
Observe also that the impact of short-circuit current is reduced when we lower the
supply voltage, as is apparent from Eq
...
50)
...
With threshold voltages scaling at a slower rate than the supply voltage, shortcircuit power dissipation is becoming of a lesser importance in deep-submicron technologies
...
fm Page 217 Friday, January 18, 2002 9:01 AM
Section 5
...
125 µm/0
...
375 µm/0
...
3 V
6
P norm
5
4
3
VDD = 2
...
5 V
0
0
1
2
3
t /t
4
5
Figure 5
...
The power is
normalized with respect to zero input rise-time
dissipation
...
sin sout
At a supply voltage of 2
...
5 V, an input/output slope ratio of 2 is
needed to cause a 10% degradation in dissipation
...
(5
...
The value of this short-circuit capacitance is a function of VDD, the transistor
sizes, and the input-output slope ratio
...
5
...
(5
...
51)
Ideally, the static current of the CMOS inverter is equal to zero, as the PMOS and
NMOS devices are never on simultaneously in steady-state operation
...
34
...
For the device sizes under consideration, the leakage current per unit drain area typically ranges between 10-100
pA/µm2 at room temperature
...
5
µm2 and operated at a supply voltage of 2
...
125 mW, which is clearly not much of an issue
...
Their value increases with increasing junction temperature, and this occurs
in an exponential fashion
...
fm Page 218 Friday, January 18, 2002 9:01 AM
218
THE CMOS INVERTER
Chapter 5
VDD
VDD
Vout = VDD
Drain Leakage
Current
Subthreshold current
Figure 5
...
ues
...
As the temperature is a strong function of the dissipated heat and its removal mechanisms, this can only be accomplished by limiting the power dissipation of the circuit
and/or by using chip packages that support efficient heat removal
...
As discussed in Chapter 3, an MOS transistor can experience a drain-source current, even
when VGS is smaller than the threshold voltage (Figure 5
...
The closer the threshold
voltage is to zero volts, the larger the leakage current at VGS = 0 V and the larger the static
power consumption
...
Standard processes feature VT values that are never smaller than
0
...
6V and that in some cases are even substantially higher (~ 0
...
This approach is being challenged by the reduction in supply voltages that typically
goes with deep-submicron technology scaling as became apparent in Figure 3
...
We concluded earlier (Figure 5
...
One approach to address this performance issue is to scale the device
thresholds down as well
...
17 to the left, which means that
the performance penalty for lowering the supply voltage is reduced
...
35
...
The continued scaling
of the supply voltage predicted for the next generations of CMOS technologies however
forces the threshold voltages ever downwards, and makes subthreshold conduction a dominant source of power dissipation
...
An example of the latter is
the SOI (Silicon-on-Insulator) technology whose MOS transistors have slope-factors that
are close to the ideal 60 mV/decade
...
14 Impact of threshold reduction on performance and static power dissipation
Consider a minimum size NMOS transistor in the 0
...
In Chapter 3,
we derived that the slope factor S for this device equals 90 mV/decade
...
5V equals 10-11A (Figure 3
...
Reducing the threshold with 200 mV to 0
...
5 V, this translates into
a static power dissipation of 106 ×170×10-11×1
...
6 mW
...
fm Page 219 Friday, January 18, 2002 9:01 AM
Section 5
...
2
VT = 0
...
35 Decreasing the threshold
increases the subthreshold current at VGS =
0
...
5 W! At that supply voltage,
the threshold reductions correspond to a performance improvement of 25% and 40%, respectively
...
The idea that the leakage current in a static CMOS circuit has to be zero is a preconception
...
As long as the noise margins are within range, this is not a compelling issue
...
This is offset by the drop in supply voltage, that is enabled by the reduced thresholds
at no cost in performance, and results in a quadratic reduction in dynamic power
...
25 µm CMOS process, the following circuit configurations obtain the same performance: 3 V supply–0
...
45 V supply–0
...
The dynamic power consumption of the latter is, however, 45 times smaller [Liu93]! Choosing the correct values of
supply and threshold voltages once again requires a trade-off
...
In the presence of a sizable static power dissipation, it is essential that non-active modules are powered down, lest static power dissipation
would become dominant
...
5
...
3
Putting It All Together
The total power consumption of the CMOS inverter is now expressed as the sum of its
three components:
2
P tot = P dyn + P dp + P stat = ( C L V DD + V DD I peak t s )f 0 → 1 + V DD I leak
(5
...
The
direct-path consumption can be kept within bounds by careful design, and should hence
not be an issue
...
chapter5
...
PDP = P av t p
(5
...
Assuming that the gate is switched at its maximum possible rate of fmax = 1/(2tp), and
ignoring the contributions of the static and direct-path currents to the power consumption,
we find
2
C L V DD
2
PDP = C L V DD f max t p = ---------------2
(5
...
Remember that earlier we had defined Eav as the average
energy per switching cycle (or per energy-consuming event)
...
Energy-Delay Product
The validity of the PDP as a quality metric for a process technology or gate topology is
questionable
...
Yet for a given structure, this number can be made arbitrarily low by
reducing the supply voltage
...
This comes at the
major expense in performance, at discussed earlier
...
The energy-delay product (EDP) does exactly
that
...
55)
It is worth analyzing the voltage dependence of the EDP
...
An optimum operation point should hence exist
...
(5
...
αC L V DD
t p ≈ -----------------------V DD – V Te
(5
...
Combining Eq
...
55) and Eq
...
56), 4
4
This equation is only accurate as long as the devices remain in velocity saturation, which is probably
not the case for the lower supply voltages
...
chapter5
...
5
Power, Energy, and Energy-Delay
221
2
3
αC L V DD
EDP = -------------------------------2 ( V DD – V TE )
(5
...
(5
...
V DDopt = 3 V TE
-2
(5
...
For sub-micron technologies with
thresholds in the range of 0
...
Example 5
...
25 µm CMOS inverter
From the technology parameters for our generic CMOS process presented in Chapter 3, the
value of VTE can be derived
...
43 V, VDsatn = 0
...
74 V
...
4 V, VDsatp = -1 V, VTEp = -0
...
VTE ≈ (VTEn+|VTEp|)/2 = 0
...
8 V = 1
...
The simulated graphs of Figure 5
...
The optimum supply voltage is predicted to equal 1
...
The charts clearly illustrate the trade-off between delay and
energy
...
5
1
1
...
5
Figure 5
...
25 µm CMOS technology
...
For instance, some designs require a
minimum performance, which requires a higher voltage at the expense of energy
...
fm Page 222 Friday, January 18, 2002 9:01 AM
222
THE CMOS INVERTER
Chapter 5
obtaining the overall system performance through the use of architectural techniques such
as pipelining or concurrency
...
5
...
T
∫
∫
0
P av
T
0
V DD
1
= -- p ( t )dt = --------- i DD ( t )dt
T
T
(5
...
Some implementations of SPICE provide built-in functions to measure the average value
of a circuit signal
...
MEASURE TRAN I(VDD) AVG command
computes the area under a computed transient response (I(VDD)) and divides it by the
period of interest
...
(5
...
Other implementations of SPICE are, unfortunately, not as extensive
...
A small circuit can
easily be conceived that acts as an integrator and whose output signal is nothing but the
average power
...
37
...
The resistance R is only provided for DC-convergence reasons and should be chosen as high as possible to minimize leakage
...
The operation
of the circuit is summarized in Eq
...
60) under the assumption that the initial voltage on
the capacitor C is zero
...
60)
T
∫
k
P av = --- i DD dt
C
0
Equating Eq
...
59) and Eq
...
60) yields the necessary conditions for the equivalent
circuit parameters: k/C = VDD/T
...
Example 5
...
4 is analyzed using the above
technique for a toggle period of 250 psec (T = 250 psec, k = 1, VDD = 2
...
The resulting power consumption is plotted in Figure 5
...
3 µW
...
MEAS AVG command yields a value of
chapter5
...
6
Perspective: Technology Scaling and its Impact on the Inverter Metrics
223
VDD
Pav
+
C
Circuit
under test
iDD
–
k iDD
R
Figure 5
...
160
...
These numbers are equivalent to an energy of 39 fJ (which is close to the 37
...
11)
...
This is due to the
injection of current into the supply, when the output briefly overshoots VDD as a result of the
capacitive coupling between input and output (as is apparent from in the transient response of
Figure 5
...
1
...
6
1
...
2
Vin: 0→1
1
0
...
6
0
...
2
0
0
0
...
5
2
t (sec)
5
...
5
x 10
-10
Figure 5
...
Perspective: Technology Scaling and its Impact on the Inverter
Metrics
In section 3
...
For the sake of clarity, we
repeat here some of the most important entries in the resulting scaling table (Table 3
...
Table 5
...
Parameter
Relation
Full Scaling
General Scaling
Fixed-Voltage Scaling
Area/Device
WL
1/S2
1/S2
1/S2
Intrinsic Delay
RonCgate
1/S
1/S
1/S
chapter5
...
4 Scaling scenarios for short-channel devices (S and U represent the technology and voltage
scaling parameters, respectively)
...
From Figure 5
...
This
rate is on course with the prediction of Table
5
...
15 as
we had already observed in Figure 3
...
The
delay of a 2-input NAND gate with a fanout of
four has gone from tens of nanoseconds in the
1960s to a tenth of a nanosecond in the year Figure 5
...
seconds by 2010
...
Hence, statistics on dissipation-per-gate or design are only marginally available
...
40, which plots the power density measured over a large
number of designs produced between 1980 and 1995
...
This is in correspondence with the fixed-voltage scaling scenario presented in
Table 5
...
For more recent years, we expect a scenario more in line with the full-scaling
model—which predicts a constant power density—due to the accelerated supply-voltage
scaling and the increased attention to power-reducing design techniques
...
The presented scaling model has one fatal flaw however: the performance and
power predictions produce purely “intrinsic” numbers that take only device parameters
into account
...
Similarly, charging and discharging the wire capacitances may dominate the energy budget
...
The impact of the wire capacitance and its
scaling behavior is summarized in Table 5
...
We adopt the fixed-resistance model introduced in Chapter 4
...
chapter5
...
6
Perspective: Technology Scaling and its Impact on the Inverter Metrics
225
∝ S2
Figure 5
...
S is
normalized to 1 for a 4 µm process
...
5 Scaling scenarios for wire capacitance
...
εc represents the impact of fringing
and inter-wire capacitances
...
This impact is limited to an increase with εc for short
wires (S = SL), but it becomes increasingly more outspoken for medium-range and long
wires (SL < S)
...
41
...
The doom-day scenario that interconnect may cause CMOS performance to saturate in the very near future hence may be exaggerated
...
g
...
chapter5
...
41 Evolution of wire delay / gate delay ratio
with respect to technology (from [Fisher98])
...
7
Summary
This chapter presented a rigorous and in-depth analysis of the static CMOS inverter
...
The PMOS is normally made wider than the NMOS due to its inferior current-driving capabilities
...
The logic swing is equal
to the supply voltage and is not a function of the transistor sizes
...
The steady-state response is not affected by fanout
...
To a first order, it can be approximated as
R eqn + R eqp
t p = 0
...
Transistor sizing may help to improve performance as
long as the delay is dominated by the extrinsic (or load) capacitance of fanout and
wiring
...
It is given by P0→1 CLVDD2f
...
The dissipation due to the direct-path
currents occurring during switching can be limited by careful tailoring of the signal
chapter5
...
8
To Probe Further
227
slopes
...
• Scaling the technology is an effective means of reducing the area, propagation delay
and power consumption of a gate
...
• The interconnect component is gradually taking a larger fraction of the delay and
performance budget
...
8
To Probe Further
The operation of the CMOS inverter has been the topic of numerous publications and textbooks
...
An extensive list of references was presented in Chapter
1
...
REFERENCES
[Dally98] W
...
Poulton, Digital Systems Engineering, Cambridge University Press, 1998
...
D
...
Nesbitt, ``The Test of Time: Clock-Cycle Estimation and Test Challenges for Future Microprocessors,'' IEEE Circuits and Devices Magazine, 14(2), pp
...
[Hedenstierna87] N
...
Jeppson, “CMOS Circuit Speed and Buffer Optimization,” IEEE Transactions on CAD, Vol CAD-6, No 2, pp
...
[Liu93] D
...
28, no
...
10-17, Jan
...
10-17
...
Mead and L
...
[Sakurai97] T
...
Kawaguchi, T
...
on Low-Power Electronics and Design,
pp
...
1997
...
Sakurai, T
...
[Sedra87] Sedra and Smith, MicroElectronic Circuits, Holt, Rinehart and Winston, 1987
...
Swanson and J
...
SC-7, No
...
146-152, April
1972
...
Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on
the Design of Buffer Circuits,” IEEE Journal of Solid-State Circuits, Vol
...
4,
pp
...
chapter5
...
9
THE CMOS INVERTER
Chapter 5
Exercises and Design Problems
For all problems, use the device parameters provided in Chapter 3 (as well as the inside back cover),
unless otherwise mentioned
...
2 µm CMOS introduced in Chapter 2, design a static CMOS
inverter that meets the following requirements:
1
...
e
...
2
...
1 nsec)
...
Notice that this
capacitance is substantially larger than the internal capacitances of the gate
...
To reduce the parasitics, use
minimal lengths (L = 1
...
Verify and optimize the design
using SPICE after proposing a first design using manual computations
...
If you have a layout editor (such
as MAGIC) available, perform the physical design, extract the real circuit
parameters, and compare the simulated results with the ones obtained earlier
...
1
6
...
3
...
3
...
2
...
2
...
4
...
2
...
3
Complementary CMOS
Pass-Transistor Logic
6
...
2
Designing Logic for Reduced Supply
Voltages
6
...
3
...
5
Summary
6
...
2
Speed and Power Dissipation of
Dynamic Logic
6
...
1
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Introduction
The design considerations for a simple inverter circuit were presented in the previous
chapter
...
The focus is on combinational logic (or non-regenerative) circuits; this is, circuits that have the property that at any point in time, the output
of the circuit is related to its current input signals by some Boolean expression (assuming
that the transients through the logic gates have settled)
...
This is in contrast to another class of circuits, known as sequential or regenerative,
for which the output is not only a function of the current input data, but also of previous
values of the input signals (Figure 6
...
This is accomplished by connecting one or more
outputs intentionally back to some inputs
...
A sequential circuit includes a combinational logic portion and a module that holds the state
...
Sequential circuits are the topic of the next Chapter
...
1 High level classification of logic circuits
...
As with the
inverter, the common design metrics by which a gate is evaluated are area, speed, energy
and power
...
For
instance, the switching speed of digital circuits is the primary metric in a high-performance processor, while it is energy dissipation in a battery operated circuit
...
We will see that certain logic styles can significantly improve performance, but are more
sensitive to noise
...
6
...
The static CMOS style
is really an extension of the static CMOS inverter to multiple inputs
...
e, low sensitivity to noise), good
performance, and low power consumption with no static power dissipation
...
2
Static CMOS Design
231
properties are carried over to large fan-in logic gates implemented using a similar circuit
topology
...
Also,
the outputs of the gates assume at all times the value of the Boolean function implemented
by the circuit (ignoring, once again, the transient effects during switching periods)
...
The latter approach has the advantage
that the resulting gate is simpler and faster
...
In this section, we sequentially address the design of various static circuit flavors
including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and passtransistor logic
...
6
...
1
Complementary CMOS
Concept
A static CMOS gate is a combination of two networks, called the pull-up network (PUN)
and the pull-down network (PDN) (Figure 6
...
The figure shows a generic N input logic
gate where all inputs are distributed to both the pull-up and pull-down networks
...
Similarly, the function of the PDN
is to connect the output to VSS when the output of the logic gate is meant to be 0
...
In this way, once the transients have settled, a path always exists between VDD and the output F, realizing a high output (“one”),
or, alternatively, between VSS and F for a low output (“zero”)
...
VDD
In1
In2
PUN
InN
pull-up: make a connection from VDD to F when
F(In1,In2,
...
Inn)
In1
In2
PDN
InN
pull-down: make a connection from VDD to Vss when
F(In1,In2,
...
2 Complementary logic gate as a combination of a PUN (pull-up network) and a
PDN (pull-down network)
...
An NMOS
switch is on when the controlling signal is high and is off when the controlling signal
is low
...
• The PDN is constructed using NMOS devices, while PMOS transistors are used in
the PUN
...
To illustrate this, consider the examples shown in Figure 6
...
In Figure 6
...
Two possible discharge scenarios are shown
...
NMOS transistors are hence the preferred devices in the PDN
...
3b, with the output initially at GND
...
This explains why PMOS transistors are preferentially used in a PUN
...
3 Simple examples
illustrate why an NMOS should be
used as a pull-down, and a PMOS
should be used as a pull-up device
...
4)
...
With all the
inputs high, the series combination conducts and the value at one end of the chain is
transferred to the other end
...
A conducting path exists between the output and input terminal if at least one of the inputs is high
...
A series connection of PMOS conducts if
both inputs are low, representing a NOR function (A
...
• Using De Morgan’s theorems ((A + B) = A·B and A·B = A + B), it can be shown that
the pull-up and pull-down networks of a complementary CMOS structure are dual
networks
...
2
Static CMOS Design
A
B
Series Combination
233
A
Conducts if A · B
(a) series
Parallel Combination
Conducts if A + B
B
(b) parallel
Figure 6
...
network, and vice versa
...
g
...
The
other network (i
...
, PUN) is obtained using duality principle by walking the hierarchy, replacing series sub-nets with parallel sub-nets, and parallel sub-nets with
series sub-nets
...
• The complementary gate is naturally inverting, implementing only functions such as
NAND, NOR, and XNOR
...
• The number of transistors required to implement an N-input logic gate is 2N
...
1 Two-input NAND Gate
Figure 6
...
The PDN network consists of two
NMOS devices in series that conduct when both A and B are high
...
This means that F is 1 if A = 0 or B = 0,
which is equivalent to F = A·B
...
1
...
VDD
Table 6
...
5 Two-input NAND gate in complementary static CMOS style
...
2 Synthesis of complex CMOS Gate
Using complementary CMOS logic, consider the synthesis of a complex CMOS gate whose
function is F = D + A· (B +C)
...
6a by using the fact that NMOS devices in series
234
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
implements the AND function and parallel device implements the OR function
...
The PDN network is broken into
smaller networks (i
...
, subset of the PDN) called sub-nets that simplify the derivation of the
PUN
...
6b, the sub-nets (SN) for the pull-down network are identified At the top
level, SN1 and SN2 are in parallel so in the dual network, they will be in series
...
On the other hand, we
need to recursively apply the duality rules to SN2
...
Finally, inside SN3, the devices are in parallel so they appear in series in the PUN
...
6c
...
VDD
VDD
C
SN1
D
D
B
C
B
SN2
A
A
A
SN4
F
F
SN3
B
D
C
F
(a) pull-down network
(b) Deriving the pull-up network
hierarchically by identifying
sub-nets
A
D
B
C
Figure 6
...
(c) complete gate
Static Properties of Complementary CMOS Gates
Complementary CMOS gates inherit all the nice properties of the basic CMOS inverter
...
The circuits also have no
static power dissipation, since the circuits are designed such that the pull-down and pullup networks are mutually exclusive
...
Consider the static two-input NAND gate shown in Figure 6
...
Three possible input
combinations switch the output of the gate from high-to-low: (a) A = B = 0 → 1, (b) A= 1,
B = 0 → 1, and (c) B= 1, A = 0 → 1
...
The large variation between case (a) and the others (b & c) is explained
by the fact that in the former case both transistors in the pull-up network are on simultaneously for A=B=0, representing a strong pull-up
...
The VTC is shifted to the left as a result of the weaker PUN
...
For the NMOS devices to turn on, both gate-tosource voltages must be above VTn, with VGS2 = VA - VDS1 and VGS1 = VB
...
2
Static CMOS Design
235
3
...
0
1
...
0
0
...
0
2
...
0
Vin, V
Figure 6
...
NMOS devices are
0
...
25µm while the PMOS devices are sized at 0
...
25µm
...
The
threshold voltages of the two devices are given by:
V Tn2 = V tn0 + γ ( (
2φ f + Vint ) –
VTn1 = V tn0
2φ f )
(6
...
2)
For case (b), M3 is turned off, and the gate voltage of M2 is set to VDD
...
Since the drive on M2 is large,
this resistance is small and has only a small effect on the voltage transfer characteristics
...
The overall impact
is quite small as seen from the plot
...
For the above example, a glitch on only one of the two inputs has a
larger chance of creating a false transition at the output than when the glitch would occur on
both inputs simultaneously
...
A
common practice when characterizing gates such as NAND and NOR is to connect all the
inputs together
...
The data
dependencies should be carefully modeled
...
For the purpose of delay analysis, each transistor is modeled as a resistor in series with an
ideal switch
...
The logic is transformed into an equivalent RC network that includes the effect of
internal node capacitances
...
8 shows the two-input NAND gate and its equivalent
RC switch level model
...
While complicating the analysis, the capacitance of the internal nodes can have quite an impact in
some networks such as large fan-in gates
...
VDD
VDD
A
M3
RP
B
M4
F
A
M2
B
M1
A
RP
B
F
RN
CL
A
Figure 6
...
RN
(a) Two-input NAND
Cint
B
(b) RC equivalent model
A simple analysis of the model shows that—similar to the noise margins—the
propagation delay depends upon the input patterns
...
Three possible input scenarios can be identified for charging the output to
VDD
...
The delay in this case is
0
...
This is not the worst-case low-tohigh transition, which occurs when only one device turns on, and is given by 0
...
For the pull-down path, the output is discharged only if both A and B are switched
high, and the delay is given by 0
...
In other words, adding
devices in series slows down the circuit, and devices must be made wider to avoid a performance penalty
...
For example, for a NAND gate to have the same pull-down delay (tphl) as a minimum-sized inverter, the NMOS devices in the NAND stack must be made twice as wide
so that the equivalent resistance the NAND pull-down is the same as the inverter
...
This first-order analysis assumes that the extra capacitance introduced by widening
the transistors can be ignored
...
Example 6
...
8a
...
5µm/0
...
75µm/0
...
This sizing should result in approximately
equal worst-case rise and fall times (since the effective resistance of the pull-down is
designed to be equal to the pull-up resistance)
...
2
Static CMOS Design
237
Figure 6
...
As
expected, the case where both inputs transition go low (A = B = 1→0) results in a smaller
delay, compared to the case where only one input is driven low
...
The reason for this involves
the internal node capacitance of the pull-down stack (i
...
, the source of M2)
...
On the other hand, for the case where A=1 and B transitions from 1→0, the pull-up PMOS device has to charge up the sum of the output and the
internal node capacitances, which slows down the transition
...
0
Input Data
Pattern
Voltage, V
A = 1, B = 1→0
1
...
0
Delay
(psec)
A = B= 0→ 1
A = B = 1→0
50
A=B=1→ 0
-1
...
0
76
A= 1→ 0, B = 1
57
Figure 6
...
The table in Figure 6
...
The firstorder transistor sizing indeed provides approximately equal rise and fall delays
...
For example, when both inputs transition from 0→1, it is important to establish the
state of the internal node
...
The worst case can be ensured by pulsing the A input from 1 →0→1, while input B
only makes the 0→1
...
The important point to take away from this example is that estimation of delay can be
fairly complex, and requires a careful consideration of internal node capacitances and data
patterns
...
A brute
force approach that applies all possible input patterns, may not always work as it is important
to consider the state of internal nodes
...
10
...
The worst-case
pull-down transition happens when only one of the NMOS devices turns on (i
...
, if either
A or B is high)
...
5µm/0
...
5µm/0
...
Since the pull-down path in the worst case is a
single device, the NMOS devices (M1 and M2) can have the same device widths as the
NMOS device in the inverter
...
Since the resistances add, the devices must be made two times larger compared
to the PMOS in the inverter (i
...
, M3 and M4 must have a size of 3µm/0
...
Since
PMOS devices have a lower mobility relative to NMOS devices, stacking devices in series
238
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
must be avoided as much as possible
...
VDD
VDD
RP
M4
B
B
M1
F
M2
B
RN
A
Problem 6
...
10 Sizing of a NOR gate to
produce the same delay as an inverter with
size of NMOS: 0
...
25µm and PMOS:
1
...
25µm
...
6c such that it has
approximately the same tplh and tphl as a inverter with the following sizes: NMOS:
0
...
25µm and PMOS: 1
...
25µm
...
This is often a reasonable assumption for a first-order analysis
...
Consider a 4-input NAND gate as shown in Figure 6
...
The
internal capacitances consist of the junction capacitance of the transistors, as well as the
gate-to-source and gate-to-drain capacitances
...
The delay analysis for such a circuit involves solving
distributed RC networks, a problem we already encountered when analyzing the delay of
interconnect networks
...
The output is discharged when all inputs are driven high
...
VDD
VDD
A
M5 B
M7 D
M6 C
A
B
M4
M3
M8
A
R6
R7
C
R5
B
D
M2
A
R3
R2
C
D
M1
F
CL
R4
B
C
R8
R1
D
C3
C2
C1
Figure 6
...
Section 6
...
69 ( R 1 ⋅ C 1 + ( R 1 + R 2 ) ⋅ C 2 + ( R 1 + R 2 + R 3 ) ⋅ C 3 + ( R 1 + R 2 + R 3 + R 4 ) ⋅ C L )
(6
...
Assuming that all NMOS
devices have an equal size, Eq
...
3) simplifies to
t
pHL
= 0
...
4)
Example 6
...
Assume that all NMOS devices have a
W/L of 0
...
25µm, and all PMOS devices have a device size of 0
...
25µm
...
12
...
Using techniques similar to those employed for the CMOS inverter in Chapter 3, the
capacitances values can be computed from the layout
...
Using our standard design rules, the area and perimeter for various devices can be
easily computed as shown in Table 6
...
While the output makes a transition
from VDD to 0, the internal nodes only transition from VDD-VTn to GND
...
VDD
Out
GND
A
B
C
D
Figure 6
...
240
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Table 6
...
Transistor
W (µm)
AS (µm2)
AD (µm2)
PS (µm)
PD(µm)
1
0
...
3125
0
...
75
0
...
5
0
...
0625
0
...
25
3
0
...
0625
0
...
25
0
...
5
0
...
3125
0
...
75
5
0
...
296875
0
...
875
0
...
375
0
...
171875
0
...
875
7
0
...
171875
0
...
875
0
...
375
0
...
171875
1
...
875
It is assumed that the output connects to a single, minimum-size inverter
...
The various contributions are summarized in
Table 6
...
For the NMOS and PMOS junctions, we use Keq = 0
...
61, and Keq
= 0
...
86, respectively
...
Table 6
...
The table shows
the intrinsic delay of the gate without extra loading
...
Capacitor
Contributions (H→L)
Value (fF) (H→L)
C1
Cd1 + Cs2 + 2 * Cgd1 + 2 * Cgs2
(0
...
0625 * 2+ 0
...
25 * 0
...
57 * 0
...
61 * 0
...
28) +
2 * (0
...
5) + 2 * (0
...
5) = 0
...
57 * 0
...
61 * 0
...
28) +
(0
...
0625 * 2+ 0
...
25* 0
...
31 * 0
...
31 * 0
...
85fF
C3
Cd3 + Cs4 + 2 * Cgd3 + 2 * Cgs4
(0
...
0625 * 2+ 0
...
25 * 0
...
57 * 0
...
61 * 0
...
28) +
2 * (0
...
5) + 2 * (0
...
5) = 0
...
57 * 0
...
61 * 1
...
28) +
2 * Cgd5+2 * Cgd6+ 2 * Cgd7+ 2 * Cgd8 2 * (0
...
5)+ 4 * (0
...
171875* 1
...
86
= Cd4 + 4 * Cd5 + 4 * 2 * Cgd6
* 0
...
22)+ 4 * 2 * (0
...
375) = 3
...
(6
...
69 -------------- ( 0
...
85fF + 3 ⋅ 0
...
47 fF ) = 85 p s
2
The simulated delay for this particular transition was found to be 86 psec! The hand analysis
gives a fairly accurate estimate given all assumptions and linearizations made
...
This is not entirely the case, as during the transition some other contributions come in
place depending upon the operating region
...
2
Static CMOS Design
241
provide a totally accurate delay prediction, but rather to give intuition into what factors influence the delay and to aide in initial transistor sizing
...
The simulated worst-case low-to-high delay time
for this gate was 106ps
...
e
...
First, the number of transistors required to
implement an N fan-in gate is 2N
...
The
second problem is that propagation delay of a complementary CMOS gate deteriorates
rapidly as a function of the fan-in
...
For an N-input NAND gate, the output capacitance increases
linearly with the fan-in since the number of PMOS devices connected to the output node
increases linearly with the fan-in
...
For the same N-input NAND gate, the effective resistance of the PDN path increases linearly with the fan-in
...
The fan-out has a large impact on the delay of complementary CMOS logic as well
...
The above observations are summarized by the following formula, which approximates the influence of fan-in and fan-out on the propagation delay of the complementary
CMOS gate
t p = a 1 FI + a 2 FI 2 + a 3 FO
(6
...
At first glance, it would appear that the increase in resistance for larger fan-in can be
solved by making the devices in the transistor chain wider
...
For
the N-input NAND gate, the low-to-high delay only increases linearly since the pull-up
resistance remains unchanged and only the capacitance increases linearly
...
13 show the propagation delay for both transitions as a function of fan-in
assuming a fixed fan-out (NMOS: 0
...
5µm)
...
The
simultaneous increase in the pull-down resistance and the load capacitance results in an
approximately quadratic relationship for tpHL
...
Design Techniques for Large Fan-in
Several approaches may be used to reduce delays in large fan-in circuits
...
13 Propagation delay of
CMOS NAND gate as a function of
fan-in
...
1
...
This lowers the resistance of devices in series and lowers the time constant
...
This technique should, therefore, be used with caution
...
A more comprehensive approach towards sizing transistors in complex CMOS
gates is discussed in the next section
...
Progressive Transistor Sizing
An alternate approach to uniform sizing (in which each transistor is scaled up uniformly), is to use progressive transistor sizing (Figure 6
...
Referring back to Eq
...
3), we see
that the resistance of M1 (R1) appears N times in the delay equation, the resistance of M2 (R2)
appears N-1 times, etc
...
Consequently, a progressive scaling of the transistors is beneficial: M1 > M2
> M3 > MN
...
For an excellent treatment on the optimal sizing of transistors in a complex network, we refer the interested reader to [Shoji88, pp
...
The reader should be aware of
Out
InN
MN
In3
M3
In2
M2
C2
In1
M1
C1
CL
C3
M1 > M 2 > M3 > M N
Figure 6
...
Section 6
...
15 Influence of transistor ordering on delay
...
one important pitfall of this approach
...
Very often, design-rule considerations force the designer to push the transistors apart, which causes the internal capacitance
to grow
...
Input Re-Ordering
Some signals in complex combinational logic blocks might be more critical than others
...
An input signal to a gate is called critical if it is the last signal of
all inputs to assume a stable value
...
Putting the critical-path transistors closer to the output of the gate can result in a speedup
...
15
...
Suppose
further that In2 and In3 are high and that In1 undergoes a 0→1 transition
...
In case (a), no path to GND exists until M1 is turned on, which is
unfortunately the last event to happen
...
In the second case,
C1 and C2 are already discharged when In 1 changes
...
4
...
16
...
Partitioning the NOR-gate into two threeinput gates results in a significant speed-up, which offsets by far the extra delay incurred by
turning the inverter into a two-input NAND gate
...
16 Logic restructuring
can reduce the gate fan-in
...
The sizing of devices should happen in its proper context
...
In Chapter 5 we found out
that an optimal fanout for a chain of inverters driving a load CL is (CL/Cin)1/N, where N is
the number of stages in the chain, and Cin the input capacitance of the first gate in the
chain
...
Can this result be extended to determine
the size of any combinational path for minimal delay? By extending our previous
approach to address complex logic networks, we will find out that this is indeed possible
[Sutherland99]
...
6)
t p = t p0 ( p + gf ⁄ γ )
(6
...
In this context, f is often called the
electrical effort
...
The more involved structure of the multiple-input gate, combined with its series devices, increases its intrinsic delay
...
Table 6
...
Table 6
...
Gate type
p
Inverter
1
n-input NAND
n
n-input NOR
n
n-way multiplexer
2n
XOR, NXOR
n2n-1
1
The approach introduced in this section is commonly called logical effort, and was first introduced in
[Sutherland99], which presents an extensive treatment of the topic
...
Section 6
...
In other
words, the logical effort of a logic gate tells how much worse it is at producing output current than an inverter, given that each of its inputs may contain only the same input capacitance as the inverter
...
Logical effort is a useful
parameter, because it depends only on circuit topology
...
4
...
4 Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2
...
5 Logical effort of complex gates
Consider the gates shown in Figure 6
...
Assuming an PMOS/NMOS ratio of 2, the input
capacitance of a minimum-sized symmetrical inverter equals 3 times the gate capacitance of a
minimum-sized NMOS (called Cunit)
...
This increases the input capacitance of the 2-input NOR to 4 Cunit, or 4/3 the capacitance
of the inverter
...
Equivalently, for the same input capacitance, the NAND and NOR gate have 4/3 and 5/3 less driving
strength than the inverter
...
’ Hence, gNAND = 4/3, and gNOR = 5/3
...
17 Logical effort of 2-input NAND
and NOR gates
...
(6
...
Figure 6
...
The slope of
ve
3
Effort
In
Delay
the line is the logical effort of the gate;
2
its intercept is the intrinsic delay
...
Observe also that fanout and
Figure 6
...
a similar way
...
The total delay of a path through a combinational logic block can now be expressed
as
N
tp =
∑t
N
p, j
= t p0
j=1
∑ p + -------
γ
fj gj
j
(6
...
By finding N – 1 partial derivatives and setting theme to zero,
we find that each stage should bear the same ‘effort’:
f1 g 1 = f2 g2 = … = f N g N
(6
...
F = f 1 f 2 …f N = C L ⁄ C g1
G = g 1 g 2 …g N
(6
...
From here on, the
analysis proceeds along the same lines as the inverter chain
...
11)
and the minimum delay through the path is
D = t p0
N
∑
j=1
N ( N H )
p j + ------------------
γ
(6
...
2
Static CMOS Design
247
Note that the overall intrinsic delay is a function of the types of logic gates in the path, and
is not affected by the sizing
...
6 Sizing combinational logic for minimum delay
Consider the logic network of Figure 6
...
The output of the network is loaded with a capacitance which is 5 times
larger than the input capacitance of the first gate, which is a minimum-sized inverter
...
Using the entries in Table 6
...
93
...
93; f2 = 1
...
16; f3 = 1
...
93
...
From this, we can derive the sizes of the gates (with respect to
their minimum-sized versions): a = f1/g2 = 1
...
34; c = f1f2f3/g4=2
...
These calculations do not have to be very precise
...
5 still result in circuits within 5% of minimum
delay
...
1
a
b
c
5
Figure 6
...
Power Consumption in CMOS Logic Gates
The sources of power consumption in a complementary CMOS inverter were discussed in
detail in Chapter 5
...
The
power dissipation is a strong function of transistor sizing (which affects physical capacitance), input and output rise/fall times (which affects the short-circuit power), device
thresholds and temperature (which affect leakage power), and switching activity
...
Making a gate more complex
mostly affects the switching activity α0→1, which has two components: a static component
that is only a function of the topology of the logic network, and a dynamic one that results
from the timing behavior of the circuit—the latter factor is also called glitching
...
For static CMOS gates with statistically independent inputs, the static
transition probability is the probability p0 that the output will be in the zero state in one
cycle, multiplied by the probability p1 that the output will be in the one state in the next
cycle:
α0 → 1 = p0 • p 1 = p0 • ( 1 – p 0 )
(6
...
14)
where N0 is the number of zero entries and N1 is the number of one entries in the output
column of the truth table of the function
...
5
...
Table 6
...
A
B
Out
0
0
1
0
1
0
1
0
0
1
1
0
From Table 6
...
(6
...
15)
Problem 6
...
Signal Statistics—The switching activity of a logic gate is a strong function of the input
signal statistics
...
For
example, consider once again a 2-input static NOR gate, and let pa and pb be the
probabilities that the inputs A and B are one
...
The probability that the output node equals one is given by
p1 = (1-pa) (1-pb)
(6
...
17)
Section 6
...
20 Transition activity of
a two-input NOR gate as a
function of the input probabilities
(pA,pB)
Figure 6
...
Observe how
this graph degrades into the simple inverter case when one of the input probabilities is set
to 0
...
Problem 6
...
The results to be obtained are given in Table 6
...
Table 6
...
α0→1
AND
(1 – pApB)pApB
OR
(1 – pA)(1 – pB)[1 – (1 – pA)(1 – pB)]
XOR
[1 – (pA + pB – 2pApB)](pA + pB – 2pApB)
Inter-signal Correlations—The evaluation of the switching activity is further
complicated by the fact that signals exhibit correlation in space and time
...
This is best illustrated with a
simple example
...
21a, and assume that the
primary inputs, A and B, are uncorrelated and uniformly distributed
...
The probability that the node Z
undergoes a power consuming transition is then determined using the AND-gate expression of Table 6
...
p0->1 = (1- pa pb) pa pb = (1-1/2 • 1/2) 1/2 • 1/2 = 3/16
(6
...
21 Example illustrating the effect of signal correlations
...
This approach, however, has two major limitations: (1) it does not deal with circuits with
feedback as found in sequential circuits; (2) it assumes that the signal probabilities at the
input of each gate are independent
...
For instance, the inputs to the AND
gate in Figure 6
...
The
approach to compute probabilities, presented previously, fails under these circumstances
...
This value for transition probability is clearly false, as logic transformations show that the network can be reduced to Z = C•B = A•A = 0, and no transition
will ever take place
...
This can be accomplished with the aid of conditional probabilities
...
pZ = p(Z=1) = p(B=1, C=1)
(6
...
If
B and C are independent, p(B=1,C=1) can be decomposed into p(B=1) • p(C=1), and this
yields the expression for the AND-gate, derived earlier: pZ = p(B=1) • p(C=1) = pB pC
...
21b), a conditional probability has to be employed, such as
pZ = p(C=1|B=1) • p(B=1)
(6
...
(6
...
The
extra condition is necessary as C is dependent upon B
...
Deriving those expressions in a structured way for large networks with reconvergent
fanout is complex, especially when the networks contain feedback loops
...
To be meaningful, the analysis program has to process a typical
sequence of input signals, as the power dissipation is a strong function of statistics of those
signals
...
In reality, the finite propagation delay from one
Section 6
...
3
...
0
Out6
Out8
Out7
1
...
22 Glitching in a chain of NAND
gates
...
0
0
200
400
600
time, psec
logic block to the next can cause spurious transitions, called glitches, critical races, or
dynamic hazards, to occur: a node can exhibit multiple transitions in a single clock cycle
before settling to the correct logic level
...
22, which displays
the simulated response of a chain of NAND gates for all inputs going simultaneously from
0 to 1
...
For this particular transition, all the odd bits must transition to 0 while the even bits remain at the value of 1
...
When the correct input ripples through the network, the output goes high
...
Although the glitches in this example are
only partial (i
...
, not from rail to rail), they contribute significantly to the power dissipation
...
Design Techniques to Reduce Switching Activity
The dynamic power of a logic gate can be reduced by minimizing the physical capacitance and
the switching activity
...
The switching activity, on the other hand, can be minimized at all level of the design abstraction, and is the focus of this section
...
1
...
Consider for
instance two alternate implementations of F = A • B • C • D, as shown in Figure 6
...
Ignore
252
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
O1
A
B
C
O2
A
B
F
D
C
D
Chain structure
Chapter 6
O1
F
O2
Tree structure
Figure 6
...
glitching and assume that all primary inputs (A,B,C,D) are uncorrelated and uniformly distributed (i
...
, p1 (a,b,c,d)= 0
...
Using the expressions from Table 6
...
7
...
However, as mentioned before, it is also important to consider the timing behavior to
accurately make power trade-offs
...
Table 6
...
O1
O2
F
p1 (chain)
1/4
1/8
1/16
p0 = 1-p1 (chain)
3/4
7/8
15/16
p0->1 (chain)
3/16
7/64
15/256
p1 (tree)
1/4
1/4
1/16
p0 = 1-p1 (tree)
3/4
3/4
15/16
p0->1 (tree)
3/16
3/16
15/256
2
...
24
...
Since both circuits implement identical logic functionality, it is clear that
the activity at the output node Z is equal in both cases
...
In the first circuit, this activity equals (1 − 0
...
2) (0
...
2) = 0
...
In
the second case, the probability that a 0 → 1 transition occurs equals (1 – 0
...
1) (0
...
1)
= 0
...
This is substantially lower
...
e
...
5)
...
3
...
Unfortunately, the
minimum area solution does not always result in the lowest switching activity
...
25
...
2
Static CMOS Design
253
A
B
B
p(A = 1) = 0
...
2
p(C = 1) = 0
...
24 Reordering of inputs affects the circuit activity
...
If data being transmitted were random, it will make no difference which architecture is
used
...
Suppose, for
instance, that A is always (or mostly) 1 and B is (mostly) 0
...
However,
in the time-multiplexed solution, the bus toggles between 0 and 1
...
A
C
A
0
t
B
1
B
C
(a) parallel data transmission
0
A
1
B
t
C
t
(b) serial data transmission
Figure 6
...
4
...
If all input signals of a gate change simultaneously, no glitching occurs
...
Such a mismatch in signal timing is typically the result of different path lengths with respect to the primary inputs of the network
...
26
...
The first network (a) suffers from glitching as a result of the wide disparity between
the arrival times of the input signals for a gate
...
Redesigning the network so that all arrival
times are identical can dramatically reduce the number of superfluous transitions (network b)
...
26 Glitching is influenced by matching of signal path lengths
...
technology, but requires 2N transistors to implement a N-input logic gate
...
This has opened the door for alternative logic families that either are simpler or
faster
...
2
...
The
purpose of the PUN in complementary CMOS is to provide a conditional path between
VDD and the output when the PDN is turned off
...
27a)
...
Figure 6
...
VDD
VDD
PMOS
load
load
F
In1
In2
In3
F
In1
In2
In3
PDN
(a) generic
PDN
(b) pseudo-NMOS
Figure 6
...
The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1
versus 2N for complementary CMOS)
...
On the other hand, the nominal low output voltage is
Section 6
...
This results in reduced noise margins and more importantly static power dissipation
...
Since
the voltage swing on the output and the overall functionality of the gate depends upon the
ratio between the NMOS and PMOS sizes, the circuit is called ratioed
...
Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along paths
similar to those used for its complementary CMOS counterpart
...
At
this operation point, it is reasonable to assume that the NMOS device resides in linear
mode (since the output should ideally be close to 0V), while the PMOS load is saturated
...
21)
Assuming that VOL is small relative to the gate drive (VDD-VT) and that VTn is equal
to VTp in magnitude, VOL can be approximated as:
k p ( – V DD – V Tp ) ⋅ V DSAT µ p ⋅ W p
V OL ≈ --------------------------------------------------------- ≈ ---------------- ⋅ V DSAT
k n ( V DD – V Tn )
µn ⋅ Wn
(6
...
Unfortunately, this has a negative impact on
the propagation delay for charging up the output node since the current provided by the
PMOS device is limited
...
The static power consumption in the low-output mode is easily derived
2
P low
V DSATp
= V DD I low ≈ V DD ⋅ k p ( – V DD – V Tp ) ⋅ V DSATp – ----------------
2
(6
...
7 Pseudo-NMOS Inverter
Consider a simple pseudo-NMOS inverter (where the PDN network in Figure 6
...
5µm/0
...
The effect of sizing the
PMOS device is studied in this example to demonstrate the impact on various parameters
...
5 to 0
...
Devices
with a W/L < 1 are constructed by making the length longer than the width
...
28
...
8 summarizes the nominal output voltage (VOL), static power dissipation, and
the low-to-high propagation delay
...
25V from VOL (which is not 0V for this inverter)
...
25V
...
A larger pull-up device improves performance, but increases
static power dissipation and lowers noise margins (i
...
, increases VOL)
...
0
2
...
0
Vout, V
1
...
0
W/Lp = 0
...
5
W/Lp =
...
0
0
...
5
1
...
5
2
...
5
Figure 6
...
Vin, V
Table 6
...
Size
VOL
Static Power
Dissipation
tplh
4
0
...
273V
298µW
56ps
1
...
5
0
...
25
0
...
For a
PMOS W/L of 4, VOL is given by (30/115) (4) (0
...
66V
...
However, pseudoNMOS still finds use in large fan-in circuits
...
29 shows the schematics of pseudoNMOS NOR and NAND gates
...
VDD
VDD
F
A
B
C
D
In1
Out
CL
(a) NOR
Figure 6
...
In2
In3
In4
(b) NAND
Section 6
...
4
257
NAND Versus NOR in Pseudo-NMOS
Given the choice between NOR or NAND logic, which one would you prefer for implementation in pseudo-NMOS?
How to Build Even Better Loads
It is possible to create a ratioed logic style that completely eliminates static currents and
provides rail-to-rail swing
...
A differential gate requires that each input is provided in complementary
format, and produces complementary outputs in turn
...
A example of such a logic family,
called Differential Cascode Voltage Switch Logic (or DCVSL), is presented conceptually
in Figure 6
...
The pull-down networks PDN1 and PDN2 use NMOS devices and are mutually
exclusive (this is, when PDN1 conducts, PDN2 is off, and when PDN1 is off, PDN2 conducts), such that the required logic function and its inverse are simultaneously implemented
...
Turning on PDN1, causes
Out to be pulled down, although there is still a fight between M1 and PDN1
...
PDN1 must be strong enough
to bring Out below VDD-|VTp|, the point at which M2 turns on and starts charging Out to
VDD —eventually turning off M1
...
Figure 6
...
Notice that it is possible to share
transistors among the two pull-down networks, which reduces the implementation overhead
VDD
VDD
M1
VDD
Out
M2
Out
Out
A
A
B
B
Out
B
PDN1
B
B
B
PDN2
A
(a) Basic principle
A
(b) XOR-XNOR gate
Figure 6
...
The resulting circuit exhibits a rail-to-rail swing, and the static power dissipation is
eliminated: in steady state, none of the stacked pull-down networks and load devices are
simultaneously conducting
...
In addition to the problem of increase complexity in design, this circuit style still
has a power-dissipation problem that is due to cross-over currents
...
Example 6
...
Notice
that as Out is pulled down to VDD-|VTp|, Out starts to charge up to VDD quickly
...
A static CMOS AND
gate (NAND followed by an inverter) has a delay of 200ps
...
5
Out = A B
M1
A
M3 B
M4
Voltage,V
A
Out = A B
AB
1
...
5
B
AB
A,B
A,B
M2
-0
...
2
0
...
6
Time, ns
0
...
0
Figure 6
...
M1 and M2
1µm/0
...
5µm/0
...
5µm/0
...
Design Consideration: Single-ended versus Differential
The DCVSL gate provides differential (or complementary) outputs
...
This is a distinct advantage,
as it eliminates the need for an extra inverter to produce the complementary signal
...
Finally, the approach prevents some of the time-differential problems introduced by additional inverters
...
When the complementary signal is generated
using an inverter, the inverted signal is delayed with respect to the original (Figure 6
...
This
causes timing problems, especially in very high-speed designs
...
32b)
...
Additionally, the dynamic power dissipation is high
...
2
Static CMOS Design
259
Vout2
Vout1
Vin
Vout2
Vin
Vout1
Vout2
Vout1
Vout1
(a) Single-ended
6
...
3
Vout2
(b) Differential
Figure 6
...
Pass-Transistor Logic
Pass-Transistor Basics
A popular and widely-used alternative to complementary CMOS is pass-transistor logic,
which attempts to reduce the number of transistors required to implement logic by allowing the primary inputs to drive gate terminals as well as source/drain terminals
[Radhakrishnan85]
...
Figure 6
...
In this gate, if the B input is high, the top transisA
tor is turned on and copies the input A to the output F
...
The switch driven by B seems to be
redundant at first glance
...
33 Pass-transistor
ensure that the gate is static, this is that a low-imped- implementation of an AND gate
...
The promise of this approach is that fewer transistors are required to implement a given
function
...
33 requires 4 transistors (including the inverter required to invert B), while a complementary CMOS implementation would require 6 transistors
...
Unfortunately, as discussed earlier, an NMOS device is effective at passing a 0 but
is poor at pulling a node to VDD
...
In fact, the situation is worsened by the fact that the devices
experience body effect, as there exists a significant source-to-body voltage when pulling
high
...
Let the source of the NMOS pass transistor be labeled x
...
24)
260
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Example 6
...
5V, the transient response of Figure 6
...
Assume that node x was initially 0
...
0
In
1
...
25µm
x
VDD
0
...
25µm
Out
0
...
25µm
Voltage, V
IN
Out
2
...
0
0
...
5
1
Time, ns
1
...
34 Transient response of charging up a node using an N device
...
node x is in a high impedance state (not driven to one of the rails using a low resistance path)
...
Notice that the output charges up quickly initially, but has slow tail
...
Hand calculation using Eq
...
24), results in an output voltage of 1
...
WARNING:
The above example demonstrates that pass-transistor gates cannot be cascaded by connecting the output of a pass gate to the gate input of another pass transistor
...
35a, where the output of M1 (node x) drives the gate of another
MOS device
...
If node C has a rail to rail swing, node Y
only charges up to the voltage on node x - VTn2, which works out to VDD-VTn1-VTn2
...
35b on the other hand has the output of M1 (x) driving the junction of M2, and there is
only one threshold drop
...
B
A
x
B
M1
A
C
Y
Out
C
x
M1
Y
M2
Out
M2
Swing on Y = VDD- VTn1
Swing on Y = VDD- VTn- VTn2
(b)
(a)
Figure 6
...
Section 6
...
10 VTC of the pass transistor AND gate
The voltage transfer curve of a pass-transistor gate shows little resemblance to complementary CMOS
...
36
...
For the case when B = VDD, the top pass
transistor is turned on, while the bottom one is turned off
...
e
...
Next consider the case when A=VDD, and B makes a transition from 0 → 1
...
Once the bottom pass transistor turns off, the output follows the input B
minus a threshold drop
...
Observe that a pure pass-transistor gate is not regenerative
...
This can be remedied by the occasional insertion of a CMOS inverter
...
1
...
25µm
A
0
...
25µm
B
B=VDD, A = 0→VDD
Vout, V
0
...
25µm
B
2
...
0
A=Vdd, B = 0→VDD
A= B = 0→VDD
F = AB
0
0
...
25µm
0
...
0
1
...
0
Vin, V
Figure 6
...
33
...
For the pass transistor circuit in Figure 6
...
The output node charges from 0V
to VDD-VTn (assuming that node x was initially at 0V) and the energy drawn from the
power supply for charging the output of a pass transistor is given by:
T
E0 → 1 =
∫ P (t )dt = V ∫ isupply (t )dt = V
DD
0
( V DD – V Tn )
T
0
DD
∫
C L dV out = C L • V DD • ( V DD – V Tn )
(6
...
262
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
A
A
B
B
Pass-Transistor
Network
F
A
A
B
B
Inverse
Pass-Transistor
Network
Chapter 6
F
(a) Basic concept
B
B
B
A
B
B
A
A
B
F = AB
A
B
F=A+B
F = AB
AND/NAND
B
A
F=A⊕B
A
A
B
B
F=A+B
OR/NOR
(b) Example pass-transistor networks
A
F=A⊕B
XOR/NXOR
Figure 6
...
Differential Pass Transistor Logic
For high performance design, a differential pass-transistor logic family, called CPL or
DPL, is commonly used
...
A number of CPL gates
(AND/NAND, OR/NOR, and XOR/NXOR) are shown in Figure 6
...
These gates possess a number of interesting properties:
• Since the circuits are differential, complementary data inputs and outputs are always
available
...
Furthermore,
the availability of both polarities of every signal eliminates the need for extra inverters, as is often the case in static CMOS or pseudo-NMOS
...
This is
advantageous for the noise resilience
...
In effect, all gates use exactly the same topology
...
This makes the design of a library of gates very simple
...
Section 6
...
11 Four-input NAND in CPL
Consider the implementation of a four-input AND/NAND gate using CPL
...
38)
...
This is substantially higher than previously discussed gates
...
One should, however, be aware of the fact that the structure
simultaneously implements the AND and the NAND functions, which might reduce the transistor count of the overall circuit
...
38 Layout and schematics of four-input NAND-gate using CPL (the final inverter stage is
omitted)
...
In summary, CPL is a conceptually simple and modular logic style
...
The availability of a simple
XOR as well of the ease of implementing some specific gate structures makes it attractive
for structures such as adders and multipliers
...
When considering CPL,
the designer should not ignore the implicit routing overhead of the complementary signals,
which is apparent in the layout of Figure 6
...
Robust and Efficient Pass-Transistor Design
Unfortunately, differential pass-transistor logic, like single-ended pass-transistor logic,
suffers from static power dissipation and reduced noise margins, since the high input to
the signal-restoring inverter only charges up to VDD-VTn
...
Solution 1: Level Restoration
...
39)
...
Assume that node X is at 0V
(out is at VDD and the Mr is turned off) with B = VDD and A = 0
...
This is, however, enough to switch the
264
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
output of the inverter low, turning on the feedback device Mr and pulling node X all the
way to VDD
...
Furthermore, no
static current path can exist through the level restorer and the pass-transistor, since the
restorer is only active when A is high
...
Level restorer
VDD
VDD
Mr
B
A
M2
Out
X
Mn
M1
Figure 6
...
While this solution is appealing in terms of eliminating static power dissipation, it
adds complexity since the circuit is now ratioed
...
The pass transistor network attempts to pull-down node X
while the level restorer pulls now X to VDD
...
Some careful
transistor sizing is necessary to make the circuit function correctly
...
When Rr is
made too small, it is impossible to bring the voltage at node X below the switching threshold of the inverter
...
This sizing problem can be reformulated as follows: the resistance
of Mn and Mr must be such that the voltage at node X drops below the threshold of the
inverter, VM = f(R1, R2)
...
VDD
VDD
B
Mr
M2
A=0
Mn
X
Out
M1
Figure 6
...
Example 6
...
One way to simplify the circuit for manual analysis is to open the feedback loop
and to ground the gate of the restoring transistor when determining the switching point (this is
a reasonable assumption, as the feedback only becomes effective once the inverter starts to
Section 6
...
Hence, Mr and Mn form a “pseudo-NMOS-like” configuration, with Mr the load transistor and Mn acting as a pull-down device to GND
...
5µm/0
...
5µm/0
...
Therefore, node X must be pulled below VDD/2 in order to switch the inverter and shut off Mr
...
42, which shows the transient response as the size of the
level restorer is varied while keeping the size of Mn fixed (0
...
25µm)
...
5µm/0
...
The detailed derivation of sizing requirement will be presented in the sequential design chapter
...
0
Voltage, V
2
...
75/0
...
50/0
...
0
W/Lr =1
...
25
0
...
25/0
...
41Transient response of the
circuit in Figure 6
...
A level restorer
that is too large can result in incorrect
evaluation
...
Adding the restoring device increases the capacitance at the internal node X, slowing down the gate
...
On the other hand, the level restorer reduces the fall time, since the PMOS transistor, once
turned on, speeds the pull-up action
...
5
Device Sizing in Pass Transistors
For the circuit shown in Figure 6
...
5µm/0
...
Determine
the maximum W/L size for the level restorer transistor for correct functionality
...
42
...
Inputs are fed to both the gate and source/drain terminals as in the case of
conventional pass transistor networks
...
42 shows a simple XOR/XNOR gate of
three variables A, B and C
...
266
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
VDD
VDD
M2
Out
M2
VDD
Complementary Output
NMOS Pass Transistor Network
M2
M1
M1
VDD
M2
Out
Out
M1
Chapter 6
M1
Out
C
C
C
C
B
B
B
B
A
Complementary inputs
to gate and source/drain
terminals
(a) general concept
A
A
A
(b) XOR/XNOR gate
Figure 6
...
Solution 2: Multiple-Threshold Transistors
...
Using zero threshold devices for the NMOS pass-transistors eliminates most of the threshold drop, and passes a signal close to VDD
...
All devices other than the pass transistors (i
...
, the inverters) are implemented using
standard high-threshold devices
...
VDD
Zero (or low)-threshold transistor
VDD
0V
2
...
5 V
Figure 6
...
The use of zero-threshold transistors can be dangerous due to the subthreshold currents that can flow through the pass-transistors, even if VGS is slightly below VT
...
43, which points out a potential sneak dc-current path
...
Section 6
...
44 CMOS transmission gate
...
The most widely-used solution to deal with the
voltage-drop problem is the use of transmission gates
...
The ideal approach is to use an
NMOS to pull-down and a PMOS to pull-up
...
44a)
...
The
transmission gate acts as a bidirectional switch controlled by the gate signal C
...
In short,
A = B
if
C=1
(6
...
Figure 6
...
Consider the case of charging node B to VDD for the transmission gate circuit in Figure 6
...
Node A is driven to VDD and transmission gate is enabled (C = 1 and C= 0)
...
However, since the PMOS device is present and turned on
(VGSp = -VDD), charging continues all the way up to VDD
...
45b shows the opposite
case, this is discharging node B to 0
...
The
PMOS transistor by itself can only pull down node B to VTp at which point it turns off
...
Though the transmission gate requires two transistors and more control
signals, it enables rail-to-rail swing
...
45 Transmission gates
enable rail-to-rail switching
Transmission gates can be used to build some complex gates very efficiently
...
46 shows an example of a simple inverting two-input multiplexer
...
27)
A complementary implementation of the gate requires eight transistors instead of six
...
46 Transmission gate multiplexer and its layout
...
47
...
To understand the operation of
this circuit, we have to analyze the B = 0 and B = 1 cases separately
...
In
the opposite case, M1 and M2 are disabled, and the transmission gate is operational, or F =
AB
...
Notice that, regardless of the
values of A and B, node F always has a connection to either VDD or GND and is hence a
low-impedance node
...
Other examples where transmission-gate logic is effectively used are fast adder circuits and registers
...
47 Transmission gate XOR
...
2
Static CMOS Design
269
Performance of Pass-Transistor and Transmission Gate Logic
The pass-transistor and the transmission gate are, unfortunately, not ideal switches, and
have a series resistance associated with it
...
48, which involves charging a node from 0 V to VDD
...
The effective resistance of the switch is modeled as a parallel
connection of the resistances Rn and Rp of the NMOS and PMOS devices, defined as (VDD
– Vout)/In and (VDD – Vout)/Ip, respectively
...
During the lowto-high transition, the pass-transistors traverse through a number of operation modes
...
28)
The resistance goes up for increasing values of Vout, and approaches infinity when Vout
reaches VDD-VTn, this is when the device shuts off
...
When Vout is small, the PMOS is saturated, but it enters the linear
mode of operation for Vout approaching VDD, giving the following approximated resistance:
V DD – V out
V DD – V out
R p = ------------------------------ = -------------------------------------------------------------------------------------------------------------------------------------IP
( V out – V DD ) 2
k p ⋅ ( – V DD – V Tp ) ( V out – V DD ) – ---------------------------------
2
1
≈ ------------------------------------------
k p ( V DD – V Tp )
(6
...
48
...
The same is
true in other design instances (for instance, when discharging CL)
...
Problem 6
...
48)
...
Figure 6
...
Such a configuration often
occurs in circuits such as adders or deep multiplexors
...
To analyze the propagation delay of this
270
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
30
2
...
5 V
Rp
0V
10
Rn || Rp
0
0
...
0
2
...
48 Simulated equivalent resistance of transmission gate for low-to-high transition
(for (W/L)n = (W/L)p = 0
...
25µm)
...
This produces the network of Figure 6
...
The delay of a network of n transmission gates in sequence can be estimated using the
Elmore approximation (see Chapter 4):
n
t p ( V n ) = 0
...
69CR eq ------------------2
(6
...
2
...
5
V1
In
C
0
2
...
5
Vi+1
0
Vn–1
C
C
Vn
C
0
(a) A chain of transmission gates
Req
In
Req
V1
C
Req
Vi
C
(b) Equivalent RC network
Figure 6
...
Vi+1
C
Req
Vn-1
C
Vn
C
Section 6
...
13 Delay through 16 transmission gates
Consider 16 cascaded minimum-sized transmission gates, each with an average resistance of
8 kΩ
...
Since the gate inputs are assumed to be fixed, there is no Miller multiplication
...
6 fF for the low-to-high transition
...
69 ⋅ CR -------------------- = 0
...
6fF ) ( 8KΩ ) -------------------------- ≈ 2
...
31)
The transient response for this particular example is shown in Figure 6
...
The simulated delay is 2
...
It is remarkable that a simple RC model predicts the delay so accurately
...
3
...
0
Out16
1
...
0 0
2
4
6
Time (ns)
8
10
Figure 6
...
The most common approach for dealing with the long delay is to break the chain and
by inserting buffers every m switches (Figure 6
...
Assuming a propagation delay tbuf for
each buffer, the overall propagation delay of the transmission-gate/buffer network is then
computed as follows,
n
m(m + 1)
n
t p = 0
...
69 CR eq --------------------- + --- – 1 t buf
m
2
(6
...
The optimal number of switches
mopt between buffers can be found by setting the derivative
t pbuf
m opt = 1
...
33)
272
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Obviously, the number of switches per segment grows with increasing values of tbuf
...
m
Req
Req
Req
Req
Req
Req
In
C
CC
C
C
CC
C
Figure 6
...
Example 6
...
The buffers shown in Figure 6
...
In some cases, it might be necessary to add an extra inverter to produce the correct polarity
...
5µm/0
...
5µm /0
...
(6
...
The simulated delay when placing an inverter every two transmission gates equals 154 psec, for every three transmission
gates is 154 psec, and for four transmission gates is 164 psec
...
CAUTION: Although many of the circuit styles discussed in the previous sections sound
very exciting, and might be superior to static CMOS in many respects, none of them has
the robustness and ease of design of complementary CMOS
...
For designs that have no extreme area, complexity, or speed constraints, complementary CMOS is the recommended design style
...
3
Dynamic CMOS Design
It was noted earlier that static CMOS logic with a fan-in of N requires 2N devices
...
The
pseudo-NMOS logic style requires only N + 1 transistors to implement an N input logic
gate, but unfortunately it has static power dissipation
...
With the addition of a clock input, it uses a sequence of precharge
and conditional evaluation phases
...
3
...
52a
...
The opera-
Section 6
...
VDD
VDD
CLK
Mp
CLK
Mp
Out
Out
CL
In1
A
C
PDN
In2
B
In3
CLK
Me
(a) n-type network
CLK
Me
(b) Example
Figure 6
...
Precharge
When CLK = 0, the output node Out is precharged to VDD by the PMOS transistor Mp
...
The evaluation FET eliminates any static power that would be consumed during the
precharge period (this is, static current would flow between the supplies if both the pulldown and the precharge device were turned on simultaneously)
...
The output is conditionally discharged based on the input values and the pull-down
topology
...
If the PDN is turned off, the
precharged value remains stored on the output capacitance CL, which is a combination of
junction capacitances, the wiring capacitance, and the input capacitance of the fan-out
gates
...
Consequently, once Out is discharged, it cannot be charged again till
then next precharge operation
...
Notice that the output can be in the high-impedance state during
the evaluation period if the pull-down network is turned off
...
As as an example, consider the circuit shown in Figure 6
...
During the precharge
phase (CLK=0), the output is precharged to VDD regardless of the input values since the
evaluation device is turned off
...
Otherwise, the output remains at
the precharged state of VDD
...
34)
A number of important properties can be derived for the dynamic logic gate:
• The logic function is implemented by the NMOS pull-down network
...
• The number of transistors (for complex gates) is substantially lower than in the
static case: N + 2 versus 2N
...
The sizing of the PMOS precharge device is not important for realizing proper functionality of the gate
...
There is however, a trade-off with power dissipation since a
larger precharge device directly increases clock-power dissipation
...
Ideally, no static current path ever exists between
VDD and GND
...
• The logic gates have faster switching speeds
...
The first (obvious) reason is due to the reduced load capacitance attributed to the
lower number of transistors per gate and the single-transistor load per fan-in
...
The low and high output levels VOL and VOH are easily identified as GND and VDD
and are not dependent upon the transistor sizes
...
Noise margins and switching thresholds have been
defined as static quantities that are not a function of time
...
Pure static analysis, therefore,
does not apply
...
Therefore, it is reasonable to set the switching threshold (VM) as well
as VIH and VIL of the gate equal to VTn
...
Design Consideration
It is also possible to implement dynamic logic using a complimentary approach, where the output node is connected by a pre-discharge NMOS transistor to GND, and the evaluation PUN
network is implemented in PMOS
...
During evaluation, the output is conditionally charged to VDD
...
Section 6
...
3
...
Fewer devices to implement a given logic function implies that the overall load
capacitance is much smaller
...
After the precharge phase, the output is high
...
As a result, tpLH = 0! The high-to-low transition, on
the other hand, requires the discharging of the output capacitance through the pull-down
network
...
The presence of the evaluation transistor slows the gate somewhat, as
it presents an extra series resistance
...
The above analysis is somewhat unfair, because it ignores the influence of the precharge time on the switching speed of the gate
...
During this time, the
logic in the gate cannot be utilized
...
For
instance, the precharge of the arithmetic unit in a microprocessor can coincide with the
instruction decode
...
Example 6
...
53 shows the design of a four-input NAND example designed using the dynamic-circuit style
...
As we had discussed above, we will assume
that the switching threshold of the gate equals the threshold of the NMOS pull-down transistor
...
9
...
It is assumed that all inputs
are set high as the clock transitions high
...
The resulting transient response is plotted in Figure 6
...
9
...
Making the PMOS too large should be
avoided, however, as it both slows down the gate, and increases the capacitive load on the
clock line
...
Table 6
...
Transistors
VOH
VOL
VM
NMH
NML
tpHL
tpLH
tpre
6
2
...
5-VTN
VTN
110 psec
0 nsec
83psec
As mentioned earlier, the static parameters are time-dependent
...
Figure 6
...
5
Out
In1
Voltage
Out
In2
1
...
5
In3
-0
...
5
1
Time, ns
Figure 6
...
CLK
input transitions—to 0
...
5V and 0
...
Above, we have defined the
switching threshold of the dynamic gate as the device threshold
...
The noise voltage needed to corrupt the signal has to be larger if the
evaluation time is short
...
When evaluating the power dissipation of a dynamic gate, it would appear that
dynamic logic presents a significant advantage
...
First, the
physical capacitance is lower since dynamic logic uses fewer transistors to implement a
given function
...
Second, dynamic logic gates by construction can at most have one transition per clock cycle
...
Finally, dynamic gates
do not exhibit short circuit power since the pull-up path is not turned on when the gate is
evaluating
...
0
CLK
Voltage, V
2
...
55) (VG=0
...
0
VG
0
...
0
Vout
(VG=0
...
54 Effect of an input glitch on the
output
...
A larger glitch is acceptable
if the evaluation phase is smaller
...
Section 6
...
Earlier, the transition probability for a static
gate was shown to be p0 p1 = p0 (1-p0)
...
For an n-tree dynamic gate, the output makes a 0¡1 transition during the precharge
phase only if the output was discharged during the preceding evaluate phase
...
35)
where p0 is the probability that the output is zero
...
For uniformly distributed inputs, the transition probability for an N-input gate is:
N
0
α 0 → 1 = -----2
(6
...
Example 6
...
An n-tree dynamic implementation is shown in Figure 6
...
For equi-probable inputs, there is then a 75% probability that the output node of the
dynamic gate will discharge immediately after the precharge phase, implying that the activity
2
for such a gate equals 0
...
e PNOR= 0
...
The corresponding activity is a lot
smaller, 3/16, for a static implementation
...
Though these example illustrate that the switching activity of dynamic
logic is generally higher, it should be noted that dynamic logic has lower physical capacitance
...
VDD
VDD
CLK
A
CL
B
A
B
Figure 6
...
CL
A
B
CLK
278
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Problem 6
...
Assume that the inputs are independent and pA=1 = 0
...
3, pC=1 =
0
...
4
...
3
...
However, there are several important considerations that must be taken into account if one
wants dynamic circuits to function properly
...
Some of these
issues are highlighted in this section
...
If the pull-down network is off, the output should ideally remain at the precharged state of VDD during the evaluation phase
...
Figure
6
...
VDD
CLK
CLK
(4)
Mp
(3)
Out
(1)
A=0
M1
(2)
CLK
CL
t
Vout
Precharge
Evaluate
Me
t
(a) Leakage sources
(b) Effect on waveforms
Figure 6
...
Source 1 and 2 are the reverse-biased diode and sub-threshold leakage of the
NMOS pull-down device M1, respectively
...
Charge
leakage causes a degradation in the high level (Figure 6
...
Dynamic circuits therefore
require a minimal clock rate, which is typically on the order of a few kHz
...
Note that the PMOS precharge device also contributes some leakage current
Section 6
...
To
some extent, the leakage current of the PMOS counteracts the leakage of the pull-down
path
...
Example 6
...
5µm/0
...
Assume that the input is
low during the evaluation period
...
However, as seen from Figure 6
...
Once the output drops
below the switching threshold of the fan-out logic gate, the output is interpreted as a low voltage
...
This is due to the leakage current
provided by the PMOS pull-up
...
0
Voltage, V
2
...
0
Figure 6
...
The
output settles to an intermediate voltage
determined by a resistive divider of the pulldown and pull up devices
...
00
10
20
time, ms
30
40
Leakage is caused by the high impedance state of the output node during the evaluate mode, when the pull down path is turned off
...
This is often done
by adding a bleeder transistor as shown in Figure 6
...
The only function of the
bleeder—a pseudo-NMOS-like pull-up device—is to compensate for the charge lost due
to the pull-down leakage paths
...
This allows the (strong) pull-down devices to
VDD
CLK
Mp
VDD
Mbl
CLK
Mbl
Mp
Out
Out
A
Ma
A
Ma
B
Mb
B
Mb
CLK
Me
CLK
Me
(a)
(b)
Figure 6
...
280
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
lower the Out node substantially below the switching threshold of the inverter
...
58b)
...
Consider the
circuit of Figure 6
...
During the precharge phase, the output node is precharged to VDD
...
Assume further that input B remains at 0 during evaluation, while input A makes
a 0 → 1 transition, turning transistor Ma on
...
This causes a drop in the output voltage, which cannot be
recovered due to the dynamic nature of the circuit
...
59 Charge sharing in dynamic networks
...
Under the above assumptions, the following initial conditions are valid: Vout(t = 0) = VDD and VX(t = 0) = 0
...
∆Vout < VTn — In this case, the final value of VX equals VDD – VTn(VX)
...
37)
Ca
= – ----- [ V DD – V Tn ( V X ) ]
CL
2
...
38)
Which of the above scenarios is valid is determined by the capacitance ratio
...
(6
...
3
Dynamic CMOS Design
281
V Tn
Ca
----- = -----------------------V DD – V Tn
CL
(6
...
The output of the dynamic
gate might be connected to a static inverter, in which case the low level of Vout would
cause static power consumption
...
Example 6
...
60,
which implements a 3-input EXOR function y = A ⊕ B ⊕ C
...
For simplicity, ignore the
load inverter, and assume that all inputs are low during the precharge operation and that all
isolated internal nodes (Va, Vb, Vc, and Vd) are initially at 0V
...
The worst-case change in output is obtained by exposing the
maximum amount of internal capacitance to the output node during the evaluation period
...
The voltage change can then be obtained by equating the
initial charge with the final charge as done with equation Eq
...
38), yielding a worst-case
change of 30/(30+50) * 2
...
94V
...
5- 0
...
56V
...
61
...
This solution obviously comes at the cost of increased area and capacitance
...
A wire routed over a dynamic node may couple capacitively and destroy the state
of the floating node
...
Consider the circuit shown in Figure 6
...
A transition in the input In of
CLK
y
Cy = 50 fF
A
A
a
Ca = 15 fF
VDD = 2
...
60 Example illustrating the
charge sharing effect in dynamic logic
...
61 Dealing with charge-sharing by precharging
internal nodes
...
the static gate may cause the output of the gate (Out2) to go low
...
A simulation of this effect is
shown in Figure 6
...
This further causes the output of the static NAND gate not to drop all the way
down to 0V, and a small amount of static power is dissipated
...
When
designing and laying out dynamic circuits, special care is needed to minimize capacitive
coupling
...
62 Example demonstrating
the effect of backgate coupling
...
The coupling capacitance consists of the gate-to-drain capacitance of the precharge
device, and includes both the overlap and the channel capacitances
...
Subsequently, the fast
rising and falling edges of the clock couple onto the signal node, as is quite apparent in the
simulation of Figure 6
...
The danger of clock feedthrough is that it may cause the (normally reverse-biased)
junction diodes of the precharge transistor to become forward-biased
...
3
Dynamic CMOS Design
due to clock feedthrough
3
...
0
Voltage, V
283
CLK
1
...
0
-1
...
63 Clock feedthrough effect
...
CMOS latchup might be another result
of this injection
...
All the above considerations demonstrate that the design of dynamic circuits is
rather tricky and requires extreme care
...
6
...
4
Cascading Dynamic Gates
Besides the signal integrity issues, there is one major catch that complicates the design of
dynamic circuits: straightforward cascading of dynamic gates to create more complex
structures does not work
...
64a
...
e
...
Assume that the primary input In makes a
0 → 1 transition (Figure 6
...
On the rising edge of the clock, output Out1 starts to discharge
...
64 Cascade of dynamic n−type blocks
...
However, there is a finite propagation
delay for the input to discharge Out1 to GND
...
As long as Out1 exceeds the switching threshold of the second gate, which
approximately equals VTn, a conducting path exists between Out2 and GND, and precious
charge is lost at Out2
...
This leaves Out2 at an intermediate voltage
level
...
The charge loss leads to reduced
noise margins and potential malfunctioning
...
This may cause inadvertent discharge in the
beginning of the evaluation cycle
...
When doing so, all transistors in the pull-down network are turned off after
precharge, and no inadvertent discharging of the storage capacitors can occur during evaluation
...
Transistors are only be
turned on when needed, and at most once per cycle
...
The two most important ones are discussed below
...
A Domino logic module [Krambeck82] consists of an n-type dynamic logic
block followed by a static inverter (Figure 6
...
During precharge, the output of the ntype dynamic gate is charged up to VDD, and the output of the inverter is set to 0
...
If one assumes that all the inputs of a Domino
gate are outputs of other Domino gates3, then it is ensured that all inputs are set to 0 at the
end of the precharge phase, and that the only transitions during evaluation are 0 → 1 transitions
...
The introduction of the static inverter has the
additional advantage that the fan-out of the gate is driven by a static inverter with a lowimpedance output, which increases noise immunity
...
Consider now the operation of a chain of Domino gates
...
During evaluation, the output of the first Domino block either stays at 0 or
makes a 0 → 1 transition, affecting the second gate
...
Domino CMOS has the following properties:
• Since each dynamic gate has a static inverter, only non-inverting logic can be implemented
...
2
3
This ignores the impact of charge distribution and leakage effects, discussed earlier
...
Section 6
...
65 DOMINO CMOS logic
...
The inverter can be sized to match the fan-out, which is already much smaller
than in the complimentary static CMOS case, as only a single gate capacitance has
to be accounted for per fan-out gate
...
However, eliminating the evaluation device extends the precharge cycle: the precharge now has to ripple through the logic network as well
...
66, where the evaluation devices have been eliminated
...
On the falling edge of the clock, the precharge operation is started
...
The input to the second gate is initially high, and it takes two gate delays before In2 is driven low
...
Similarly, the third gate has to wait till the second gate precharges before it can
start precharging, etc
...
Another important negative is the extra power dissipation when both pull-up
and pull-down devices are on
...
Dealing with the Non-inverting Property of Domino Logic
...
This requirement has
VDD
CLK
Mp
VDD
CLK
VDD
Mp
CLK
Mp
Out1
1->0
Outn
0->1
In1
Out2
0->1
0->1
In2
1->0
In3
Inn
1->0
1->0
Figure 6
...
The
circuit also exhibits static power dissipation
...
There are several ways to deal with the
non-inverting logic requirement
...
67 shows one approach to the problem—reorganizing the logic using simple boolean transforms such as De Morgan’s Law
...
Domino AND
A
B
X
C
D
E
Y
F
G
H
A
B
X
C
D
E
F
G
H
Y
Domino AND-OR
Domino OR
(a) before logic transformation
(b) after logic transformation
Figure 6
...
A general but expensive approach to solving the problem is the use of differential
logic
...
Figure 6
...
Note that all inputs
come from other differential Domino gates, and are low during the precharge phase, while
making a conditional 0→1 transition during evaluation
...
This comes at the expense of an increased
power dissipation, since a transition is guaranteed every single clock cycle regardless of
the input values—either O or O must make a 0→1 transition
...
Notice that this circuit is not ratioed, even in the presence of the PMOS pull-up
devices! Due to its high-performance, this differential approach is very popular, and is
used in several commercial microprocessors
...
Several optimizations can be performed on
Domino logic gates
...
68 Simple dual rail (differential)
Domino logic gate
...
3
Dynamic CMOS Design
287
the transistors in the static inverter
...
The critical path during evaluation goes through the pull-down path of the
dynamic gate, and the PMOS pull-up transistor of the static inverter
...
This can be accomplished by using a small
(minimum) sized NMOS and a large PMOS device
...
The only disadvantage of using a large beta ratio is a reduction in noise margin
...
Numerous variations of Domino
VDD
logic have been proposed [Bernstein98]
...
The basic
concept is illustrated is Figure 6
...
It
CLK
A
exploits the fact that certain outputs are
O2= B(C+D) = B O3
CLK
subsets of other outputs to generate a
B
number of logical functions in a single
O3= C+D
gate
...
Since O2 equals B ·O3, it can
reuse the logic for O3
...
69 Multiple output Domino
internal nodes have to be precharged to
VDD to produce the correct results
...
However, the number of evaluation transistors is drastically reduced as
they are amortized over multiple outputs
...
Compound Domino (Figure 6
...
Instead of each dynamic
gate driving a static inverter, it is possible to combine the outputs of multiple dynamic
gates with the aid of a complex static CMOS gate, as shown in Figure 6
...
The outputs of
three dynamic structures, implementing O1 = A B C, O2 = D E F and O3 = G H, are combined using a single complex CMOS static gate that implements O = (o1+o2) o3
...
Compound Domino is a useful tool for constructing complex dynamic logic gates
...
For example, a large fan-in Domino AND can be implemented as parallel
dynamic NAND structures with lower fan-in that are combined using a static NOR gate
...
70 Compound Domino logic uses
complex static gates at the output of the
dynamic gates
...
Care must be taken to ensure that the dynamic nodes are not affected by the
coupling between the output of the static gates and the output of dynamic nodes
...
np-CMOS, provides an alternate approach to cascading dynamic logic by using two
flavors (n-tree and p-tree) of dynamic logic
...
71)
([Goncalvez83, Friedman84, Lee86])
...
The output conditionally makes a 0 → 1 transition during evaluation depending on its inputs
...
71 The np-CMOS logic
circuit style
...
If the n-tree gates are controlled by CLK, and p-tree gates are
controlled using CLK, n-tree gates can directly drive p-tree gates, and vice-versa
...
During the precharge phase (CLK = 0), the output of the n-tree gate, Out1, is charged
Section 6
...
Since the n-tree
gate connects PMOS pull-up devices, the PUN of the p-tree is turned off at that time
...
This ensures that no accidental discharge of Out2
can occur
...
A disadvantage of the np-CMOS logic style is that
the p-tree blocks are slower than the n-tree modules, due to the lower current drive of the
PMOS transistors in the logic network
...
6
...
4
...
Each of the circuit styles has its advantages and disadvantages
...
No single style optimizes all these measures at the
same time
...
The static approach has the advantage of being robust in the presence of noise
...
This ease-of-design does not come for free: for complex gates with a large fan-in, complementary CMOS becomes expensive in terms of area and performance
...
Pseudo-NMOS is simple and fast at the expense
of a reduced noise margin and static power dissipation
...
Dynamic logic, on the other hand, makes it possible to implement fast and small
complex gates
...
Parasitic effects such as charge sharing make the
design process a precarious job
...
The current trend is towards an increased use of complementary static CMOS
...
These tools emphasize optimization at the logic rather than the circuit level and put
a premium on robustness
...
6
...
2
Designing Logic for Reduced Supply Voltages
In Chapter 3, we projected that the supply voltage for CMOS processes will continue to
drop over the coming decade, and may go as low as 0
...
To maintain performance under those conditions, it is essential that the device thresholds scale as well
...
72a shows a plot of the (VT, VDD) ratio required to maintain a given performance
level (assuming that other device characteristics remain identical)
...
Reducing the threshold voltage, increases the
subthreshold leakage current exponentially as we derived in Eq
...
40) (repeated here for
the sake of clarity)
...
5
1
...
75
tpd=645pS
ID, A
tpd=420pS
1
...
5
0
...
0
0
...
15
0
...
35
VT, V
(a) VDD/VT for fixed performance
10-2
10-3
VT = 0
...
4 V
10-6
-7
10
10-8
10-9
10-10
10-11
10-12
0 0
...
2 0
...
4 0
...
6 0
...
8 0
...
0
VGS, V
(b) Leakage as a function of VT
0
...
72 Voltage Scaling (VDD/VT on delay and leakage)
Ileakage = I S 10
V GS – V Th
-----------------------
S
1 – 10
nV D S
– ------------
S
(6
...
The subthreshold leakage of an inverter is the current
of the NMOS for Vin = 0V and Vout = VDD (or the PMOS current for Vin = VDD and Vout =
0)
...
72b
...
For example, the processor in a cellular phone remains in idle mode for a majority of the time
...
This is only possible if leakage is low—this is, the devices have a high threshold voltage
...
To satisfy the contradicting
requirements of high-performance during active periods, and low leakage during standby,
several process modifications or leakage-control techniques have been introduced in
CMOS processes
...
18 µm CMOS support
devices with different thresholds—typically a device with low threshold for high performance circuits, and a transistor with high threshold for leakage control
...
To use this approach for the control of individual devices requires a dual-well process (see Figure 2
...
Clever circuit design can also help to reduce the leakage current, which is a function
of the circuit topology and the value of the inputs applied to the gate
...
In an inverter with In = 0, the sub-threshold
Section 6
...
In more
complex CMOS gates, the leakage current depends upon the input vector
...
Under these conditions, the intermediate node X settles to,
V X ≈ V th ln ( 1 + n )
(6
...
Clearly, the sub-threshold leakage under this condition is slightly smaller
than that of the inverter
...
Figure 6
...
VDD
A
P1
B
P2
G
A
B
VX
ISUB
0
0
1
1
0
1
0
1
Vth ln (1+n)
0
Vdd-VT
0
INSUB (VGS = VBS = -VX)
INSUB (VGS = VBS = 0)
INSUB (VGS = VBS = 0)
2 IPSUB (VSG = VSB = 0)
N1
VX
B
A
N2
Figure 6
...
In short-channel MOS transistors, the sub-threshold leakage current depends not
only on the gate drive (VGS) and the body bias (VBS), but also depends on the drain voltage
(VDS)
...
Typical value for DIBL can range
from 20-150 mV change in VT per volt change in VDS
...
74 illustrates the impact on
the sub-threshold leakage as a result of
• a decrease in gate drive—point A to B
• an increase in body bias—point A to C
• an increase in drain voltage—point A to D
...
The intermediate voltage reduces the drain-source voltage of the
top-most device, and hence reduces its leakage
...
73, when both M1and M2 are off
...
75, we see that VX settles to approximately 100 mV in steady state
...
In summary, the sub-threshold leakage in complex stacked circuits can be significantly lower than in individual devices
...
74 Dependence of subthreshold leakage current on terminal
voltages for a typical 0
...
75 Load line indicating the
steady state solution for the intermediate
node voltage in a transistor stack
...
Exploiting this effect requires a careful selection of the input
signals to every gate during standby or sleep mode
...
8 Computing VX
Eq
...
41) represents intermediate node voltage for a two-input NAND with less than 10%
error, when A = B = 0
...
(6
...
5
...
5)
...
5
Summary
In this chapter, we have extensively analyzed the behavior and performance of combinational CMOS digital circuits with regard to area, speed, and power
...
6
To Probe Further
293
• Static complementary CMOS combines dual pull-down and pull-up networks, only
one of which is enabled at any time
...
Techniques to
deal with fan-in include transistor sizing, input reordering, and partitioning
...
Extra buffering is needed for large fanouts
...
This results in a substantial reduction in gate complexity at the expense
of static power consumption and an asymmetrical response
...
The most popular approaches in
this class are the pseudo-NMOS techniques and differential DCVSL, which requires
complementary signals
...
This
results in very simple implementations for some logic functions
...
NMOS-only pass-transistor logic produces even
simpler structures, but might suffer from static power consumption and reduced
noise margins
...
• The operation of dynamic logic is based on the storage of charge on a capacitive
node and the conditional discharging of that node as a function of the inputs
...
Dynamic logic trades off noise margin for performance
...
Cascading dynamic gates can cause problems, and should be addressed carefully
...
This activity is a function of the input statistics, the network
topology, and the logic style
...
• Threshold voltage scaling is required for low-voltage operation
...
6
To Probe Further
The topic of (C)MOS logic styles is treated extensively in the literature
...
Some of the most comprehensive treatments can be found
in [Weste93] and [Chandrakasan01]
...
The topic of power minimization is relatively new
...
294
DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
Chapter 6
Innovations in the MOS logic area are typically published in the proceedings of the
ISSCC Conference and the VLSI circuits symposium, as well as the IEEE Journal of Solid
State Circuits (especially the November issue)
...
Bernstein et al
...
[Chandrakasan95] A
...
Brodersen, Low Power Digital CMOS Design, Kluwer
Academic Publishers, 1995
...
Chandrakasan, W
...
Fox, ed
...
[Friedman84] V
...
Liu, “Dynamic Logic CMOS Circuits,” IEEE Journal of Solid
State Circuits, vol
...
2, pp
...
[Goncalvez83] N
...
De Man, “NORA: A Racefree Dynamic CMOS Technique for
Pipelined Logic Structures,” IEEE Journal of Solid State Circuits, vol
...
3, pp
...
[Heller84] L
...
, “Cascade Voltage Switch Logic: A Differential CMOS Logic Family,”
Proc
...
16–17, February 1984
...
Krambeck et al
...
SC-17, no
...
614–619, June 1982
...
M
...
Szeto, “Zipper CMOS,” IEEE Circuits and Systems Magazine, pp
...
[Parameswar96] A
...
Hara, and T
...
SC-31, no
...
805-809, June 1996
...
Rabaey and M
...
, Low Power Design Methodologies, Kluwer, 1995
...
Radhakrishnan, S
...
Maki, “Formal Design Procedures for
Pass-Transistor Switching Circuits,” IEEE Journal of Solid State Circuits, vol
...
2,
pp
...
[Shoji88] M
...
[Shoji96] M
...
[Sutherland99] I
...
Sproull, and D
...
[Weste93] N
...
Eshragian, Principles of CMOS VLSI Design: A Systems Perspective,
Addison-Wesley, 1993
...
Yano et al
...
8 ns CMOS 16 × 16 b Multiplier Using Complimentary PassTransistor Logic,” IEEE Journal of Solid State Circuits, vol
...
388–395, April
1990
...
Ye, S
...
De, “A new technique for standby leakage reduction in high-performance circuits,” Symposium on VLSI Circuits, pp 40-41, 1998
...
6
To Probe Further
295
chapter7
...
1
Introduction
7
...
3
Classification of Memory Elements
7
...
5
...
5
...
5
...
4
...
6 Pulse Registers
7
...
2SR Flip-Flops
6
...
2
The C2MOS Latch
7
...
3Multiplexer Based Latches
7
...
2
7
...
4Master-Slave Based Edge Triggered
Register
NORA-CMOS— A Logic Style for
Pipelined Structures
7
...
3
True Single-Phase Clocked Register
(TSPCR)
7
...
5Non-ideal clock signals
7
...
6Low-Voltage Static Latches
7
...
7 Sense-Amplifier Based Registers
7
...
fm Page 271 Tuesday, April 18, 2000 8:52 PM
Section
7
...
1Latch- vs
...
8
...
9
Non-Bistable Sequential Circuits
7
...
1The Schmitt Trigger
7
...
2Monostable Sequential Circuits
7
...
3Astable Circuits
7
...
11 Summary
7
...
13 Exercises and Design Problems
271
chapter7
...
1
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
Introduction
Combinational logic circuits that were described earlier have the property that the output
of a logic block is only a function of the current input values, assuming that enough time
has elapsed for the logic gates to settle
...
In
these circuits, the output not only depends upon the current values of the inputs, but also
upon preceding input values
...
Figure 7
...
The system depicted
here belongs to the class of synchronous sequential systems, in which all registers are
under control of a single global clock
...
The Next State is determined based on the Current State and
the current Inputs and is fed to the inputs of registers
...
The register then ignores changes in the input signals until the
next rising edge
...
)
Inputs
Outputs
COMBINATIONAL
LOGIC
Current State
Registers
Q
Next State
D
CLK
Figure 7
...
This chapter discusses the CMOS implementation of the most important sequential
building blocks
...
Before embarking on a detailed discussion on the various design options, a revision of the
design metrics, and a classification of the sequential elements is necessary
...
2
Timing Metrics for Sequential Circuits
There are three important timing parameters associated with a register as illustrated in Figure 7
...
The set-up time (tsu) is the time that the data inputs ( input) must be valid before
D
the clock transition (this is, the 0 to 1 transition for apositive edge-triggered register)
...
Assum-
chapter7
...
3
Classification of Memory Elements
273
CLK
t
tsu
D
Register
D
thold
DATA
STABLE
Q
CLK
t
tc-q
Q
DATA
STABLE
Figure 7
...
ing that the set-up and hold-times are met, the data at theD input is copied to the Q output
after a worst-case propagation delay (with reference to the clock edge) denoted bytc-q
...
Assume that the worst-casepropagation
delay of the logic equals tplogic, while its minimum delay (also called thecontamination
delay) is tcd
...
1)
The hold time of the register imposes an extra c
onstraint for proper operation,
t cd register + t cd logic ≥ t hold
(7
...
As seen from Eq
...
1), it is important to minimize thevalues of the timing parameters associated with the register, as these directly affect the rate at whicha sequential circuit can be clocked
...
For example,the DEC Alpha EV6 microprocessor
[Gieseke97] has a maximum logic depth of 12 gates, and the register overhead stands for
approximately 15% of the clock period
...
(7
...
7
...
Memory
that is embedded into logic is foreground memory, and most often organized as individis
ual registers of register banks
...
Background memory, discussed later in this book, achieves
chapter7
...
In this chapter, we focus on foreground memories
...
Static memories preserve the state as long as the
power is turned on
...
Static memories are most useful when the register won’
t
be updated for extended periods of time An example of such is configuration data, loaded
...
This condition also holds for most processors that use conditional clocking (i
...
, gated clocks) where the clock is turnedoff for unused modules
...
Memory based on positive feedback fall under
the class of elements called multivibrator circuits
...
Dynamic memories store state for a short period of time o the order of millisec— n
onds
...
As with dynamic logic discussed earlier, the capacitors
have to be refreshed periodically to annihilate charge leakage
...
They
are most useful in datapath circuits that require high performance levels and are periodically clocked
...
Latches vs
...
It is
level-sensitive circuit that passes the D input to the Q output when the clock signal is high
...
When the clock is low, the input data sampled
on the falling edge of the clock is held stable at the output for the entire phase, and the
latch is in hold mode
...
A latch operating under the above conditions is a positive latch
...
shown in Figure 7
...
A wide variety of static and dynamic implementations exists for the
realization of latches
...
They are typically built using the latch primitivesof Figure 7
...
A
most-often recurring configuration is themaster-slave structure that cascades a positive
and negative latch
...
Examples of these
are shown later in this chapter
...
fm Page 275 Tuesday, April 18, 2000 8:52 PM
Section 7
...
3 Timing of positive and negative latches
...
4 Static Latches and Registers
7
...
1
The Bistability Principle
Vi1
Vo1 = Vi2
Vi2 = Vo1
Vo1
Static memories use positive feedback to create a bistable circuit — a circuit having two
stable states that represent 0 and 1
...
4a which shows
,
two inverters connected in cascade along with a voltage-transfer characteristic typical of
such a circuit
...
The latter plot is rotated to accentuate that Vi2 = Vo1
...
4a
...
4 Two cascaded inverters (a)
and their VTCs (b)
...
fm Page 276 Tuesday, April 18, 2000 8:52 PM
276
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
ble operation points (A, B, and C), as demonstrated on the combined VTC
...
Suppose that the cross-coupled inverter pair is biased at point C
...
This is a consequence of the gain around the loop being larger than 1
...
5a
...
This deviation is amplified by the gain of the inverter
...
The bias point moves away from until one of
C
the operation points A or B is reached
...
Every deviation (even the smallest one) causes the operation point to run away from its
original bias
...
Operation points with this property are termedmetastable
...
5
Vi1 = Vo2
(a)
Metastability
...
5b
...
Even a rather large deviation from the operation point is reduced in size and disappears
...
The circuit serves as a
memory, storing either a 1 or a 0 (corresponding to positionsA and B)
...
Since the precondition for stability is that the loop gain is smaller
G
than unity, we can achieve this by making (or B) temporarily unstable by increasingG to
A
a value larger than 1
...
For
V
instance, assume that the system is in positionA (Vi1 = 0, Vi2 = 1)
...
The positive feedback regenerates the effect of the trigger pulse, and the circuit
moves to the other state (B in this case)
...
fm Page 277 Tuesday, April 18, 2000 8:52 PM
Section 7
...
In summary, a bistable circuit has two stable states
...
A trigger pulse must be applied to change the state
of the circuit
...
7
...
2
SR Flip-Flops
The cross-coupled inverter pair shown in the previous section provides an approach to
store a binary variable in a stable way
...
The simplestincarnation accomplishing this i the wells
know SR — or set-reset— lip-flop, an implementation of which is shown in Figure 7
...
f
This circuit is similar to the cross-coupled inverter pair with NOR gates replacing the
inverters
...
These outputs are complimentary (except for theSR = 11 state)
...
If a positive
(or 1) pulse is applied to the S input, the Q output is forced into the 1 state (withQ going to
0)
...
S
R
Q
Q
0
0
Q
Q
1
S
0
1
0
0
1
1
1
0
0
1
0
Q
S
Q
R
Q
Q
R
Forbidden State
(a) Schematic diagram
Figure 7
...
These results are summarized in the characteristic table of the flip-flop, shown in
Figure 7
...
The characteristic table is the truth table of the gate and lists the output states
as functions of all possible input conditions
...
Since this does not correspond with our constraint that and Q must be
Q
complementary, this input mode is considered to be forbidden
...
Finally,
Figure 7
...
chapter7
...
1 SR Flip-Flop Using NAND Gates
An SR flip-flop can also be implemented using a
cross-coupled NAND structure as shown in Figure
7
...
Derive the truth table for a such an implementation
...
7 NAND-based SR flip-flop
...
Most systems operate in a synchronous fashionwith transition events referenced to a
clock
...
8
...
Observe that the number of transistors is identical to the implementation of Figure
7
...
The drawbackof saving some
transistors over a fully-complimentary CMOS implementation si that transistor sizing
becomes critical in ensuring proper functionality
...
The combination of transistors M4, M7, and M8 forms a ratioed
inverter
...
Once this is achieved, the positive feedback
causes the flip-flop to invert states
...
VDD
M2
M4
Q
Q
CLK
S
M6
M1
M5
M8
CLK
M7
M3
R
Figure 7
...
The presented flip-flop does not consume any static power
...
No static paths between VDD
and GND can exist except during switching
...
1
Transistor Sizing of Clocked SR Latch
Assume that the cross-coupled inverter pair is designed such that the inverter thresholdVM is
located at VDD/2
...
25 µm CMOS technology, the following transistor sizes were
selected: (W/L)M1 = (W/L)M3 = (0
...
25µm), and (W/L)M2 = (W/L)M4 = (1
...
25µm)
...
chapter7
...
4
Static Latches and Registers
279
To switch the latch from theQ = 0 to the Q = 1 state, it is essential that the low level of
the ratioed, pseudo-NMOS inverter (M 5-M6)-M 2 be below the switching threshold of the
inverter M3-M 4 that equals VDD/2
...
The boundary conditions on the transistor
sizes can be derived by equating the currents in the inverter forVQ = VDD / 2, as given in Eq
...
3) (this ignores channel length modulation)
...
5V and VM = 1
...
We assume that M5 and M6 have identical
sizes and that W/L5-6 is the effective ratio of the series connected devices
...
W
k′n ----
L
2
2
V DSATn
V DSATp
( V – V )V
W
Tn
DSATn – ----------------- = k′ p ---- ( – V DD – V Tp )V DSATp – -----------------
DD
2
L 2
2
5–6
(7
...
25 µm process, Eq
...
3) results in the constraint that the
effective (W/L)M5-6 ≥ 2
...
This implies that the individual device ratio forM5 or M6 must be
larger that approximately 4
...
Figure 7
...
We notice that the individual device ratio of greater than 3
is sufficient to bring theQ voltage to the inverter switching threshold
...
Figure 7
...
The plot confirms that an individualW/L ratio of greater than 3 is required to overpower
the feedback and switch the state of the latch
...
0
3
Q
S
1
...
5µm
W=0
...
7µm
Volts
Q (Volts)
2
1
...
8µm
0
...
9µm
W=1µm
0
...
0
2
...
0
W/L5and 6
(a)
3
...
0
0 0 0
...
4 0
...
8 1 1
...
4 1
...
8 2
time (nsec)
(b)
Figure 7
...
(a) DC output voltage vs
...
5µm/
...
(b) Transient response shows thatM5
and M6 must each have aW/L larger than 3 to swtich theSR flip-flop
...
Some simplifications are therefore necessary
...
8, whereQ and Q are set to 0 and 1, respectively
...
In the first phase of the transient, node is being
Q
pulled down by transistors M5 and M6
...
The transient response is hence determined by the pseudo-NMOS
inverter formed by (M5-M6) and M2
...
This accelerates the pulling down of nodeQ
...
fm Page 280 Tuesday, April 18, 2000 8:52 PM
280
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
derive that the propagation delay of node Q is approximately equal to the delay of the
pseudo-NMOS inverter formed by (M5-M6) and M2
...
M
Example 7
...
8, as obtained from simulation, is plotted in
Figure 7
...
The devices are sized as described in Example 7
...
The flip-flop is initially in the reset state, and an
S-pulse is
applied
...
Once the switching threshold of the inverterM3-M4 is reached, the Q output starts to rise
...
From the simulation results, we can derive thatpQ and tpQ equal
t
120psec and 230psec, respectively
...
0
2
...
0
Figure 7
...
0
...
9
1
...
1
1
...
3
1
...
5
Problem 7
...
8, it is also possible to use complementary
logic to implement the clocked SR FF
...
This circuit is more complex, but switches faster and consumes less switching
power
...
7
...
3
Multiplexer Based Latches
There are many approaches for constructing latches
...
Multiplexer based latches can provide
similar functionality to theSR latch, but has the important added advantage that the sizing
of devices only affects performance and is not critical to the functionality
...
11 shows an implementation of static positive and negative latches based
on multiplexers
...
fm Page 281 Tuesday, April 18, 2000 8:52 PM
Section 7
...
When the clock signal is high,
the input 1 of the multiplexer which connects to the output of the latch, is selected
...
Similarly in the positive
ile
latch, the D input is selected when clock is high and the output is held (using feedback)
,
when clock is low
...
11 Negative and positive latches based on
multiplexers
...
12
...
During this phase, the feedback
loop is open since the top transmission gate isoff
...
The number of transistors that the clock touches is important since it has an activity factor of 1
...
CLK
Q
CLK
D
Figure 7
...
CLK
It is possible to reduce the clock load to two transistors by using implement multiplexers using NMOS only pass transistor as shown in Figure 7
...
approach is the reduced clock load of only two NMOS devices
...
While attractive for its simplicity, the use ofNMOS only pass
transistors results in the passing of a degraded high voltage ofVDD-VTn to the input of the
first inverter
...
It also causes static power dissipation in first inverter, as already pointed out in Chapter 6
...
chapter7
...
13 Multiplexer based NMOS latch using NMOS only pass transistors for multiplexers
...
4
...
14
...
A multiplexer based latch
is used in this particular implementation, though any latch can be used to realize the master and slave stages
...
During this period, the slave stage is in
the hold mode, keeping its previous value using feedback
...
During the
high phase of the clock, the slave stage samples the output of the master stageQM), while
(
the master stage remains in a hold mode
...
The value ofQ is the value of D
right before the rising edge of the clock, achieving thepositive edge-triggered effect
...
e
...
Slave
CLK
Master
0
1
1
Q
D
QM
QM
D
0
Q
CLK
CLK
Figure 7
...
A complete transistor level implementation of a themaster-slave positive edge-triggered register is shown in Figure 7
...
The multiplexer is implemented using transmission
gates as discussed in the previous section
...
During this period, T3 is off and T4 is on and
the cross-coupled inverters (I5, I6) holds the state of the slave latch
...
T1 is off and T2
chapter7
...
4
Static Latches and Registers
283
is on, and the cross coupled inverters I3 and I4 holds the state of QM
...
I2
D
T2
I1
I5
T1
I3
T4
I4
I6
Q
T3
QM
CLK
Figure 7
...
Problem 7
...
3 without loss of functionality
...
As discussed earlier, there are three important timing metrics in registers: theset up time, the hold time and
the propagation delay
...
Assume that the
propagation delay of each inverter is tpd_inv and the propagation delay of the transmission
gate is tpd_tx
...
The set-up time is the time before the rising edge of the clock that the input data
D
must become valid
...
For the transmission gate multiplexer-based register, the input D has to propagate through I1, T1, I3 and
I2 before the rising edge of the clock
...
Otherwise, it is possible for the
cross-coupled pair I2 and I3 to settle to an incorrect value
...
The propagation delay is the time for the value of QM to propagate to the output Q
...
Therefore the delaytc-q is simply the delay throughT3 and
I6 (tc-q = tpd_tx + tpd_inv)
...
In this case, the transmission gate 1 turns off when clock goes high and
T
therefore any changes in the D-input after clock going high are not see by the input
...
chapter7
...
3 Timing analysis using SPICE
...
Figure 7
...
For the 210 psec case, the correct value of inputD
is sampled (in this case, the Q output remains at the value ofVDD)
...
Node QM
starts to go high while the output ofI2 (the input to transmission gate T2) starts to fall
...
The
set-up
time for this register is therefore 210 psec
...
0
3
...
5
2
...
5
CLK
D
D
CLK
Q
1
...
0
QM
I2-T2
0
...
5
0
...
0
-0
...
5
Volts
Volts
2
...
5
-0
...
2
0
...
6
time(ns)
(a) Tsetup = 0
...
8
1
0
0
...
4
0
...
20ns
0
...
16 Set-up time simulation
...
The D input edge is once again
skewed relative to the clock signal till the circuit stop functioning
...
e
...
Finally, for the
propagation
delay, the inputs are transitioned at least a set-up time before the rising edge of the clock and
the delay is measured from the 50% point of theCLK edge to the 50% point of the Q output
...
17), tc-q (lh) was 160psec and tc-q(hl) was 180psec
...
5
CLK
Volts
1
...
17 Simulation of propagation delay
...
5
-0
...
5
1
time, ns
1
...
5
chapter7
...
4
Static Latches and Registers
285
As mentioned earlier, the drawback of the transmission gate register is the high capacitive
load presented to the clock signal
...
Ignoring the overhead required to
invert the clock signal (since the buffer inverter overhead can be amortized over multiple
register bits), each register has a clock load of 8 transistors
...
Figure 7
...
CLK
CLK
D
I1
T1
I2
CLK
I3
T2
Q
I4
CLK
Figure 7
...
The penalty for the reduced clock load is increased design complexity
...
The sizing requirements for the transmission gate
s
can be derived using a similar analysis as performed for theSR flip-flop
...
If
minimum-sized devices are to be used in the transmission gates, it is essential that th
e
transistors of inverter I2 should be made even weaker
...
Using minimum or close-to-minimumsize devices in the transmission gates is desirable to reduce the power dissipation in the
latches and the clock distribution network
...
When the slave stage ison (Figure 7
...
As long
as I4 is a weak device, this is fortunately not a majorproblem
...
19 Reverse conduction possible in the transmission gate
...
4
...
Even if this were possible, this would still not be a
good assumption
...
fm Page 286 Tuesday, April 18, 2000 8:52 PM
286
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
the load capacitances can vary based on data stored in theconnecting latches
...
20b
...
20a
...
20 Master-slave register based
on NMOS-only pass transistors
...
However, since CLK and CLK are both high for a
short period of time (the overlap period), both sampling pass transistors conduct and
there is a direct path from the D input to the Q output
...
The is know as a race condition in which the value of the output
Q is a function of whether the input D arrives at node X before or after the falling
edge of CLK
...
• The primary advantage of the multiplexerbased register is that the feedback loop is
open during the sampling period, and therefore sizing of devices is not critical to
functionality
...
Those problems can be avoided by using twonon-overlapping clocks PHI1 and PHI2
instead (Figure 7
...
During the nonoverlap time, the FF is in the high-impedance state— the feedback loop is
open, the loop gain is zero, and the input is disconnected
...
Hence the namepseudostatic: the register
employs a combination of static and dynamic storage approaches depending upon the state
of the clock
...
fm Page 287 Tuesday, April 18, 2000 8:52 PM
Section 7
...
21 Pseudostatic two-phaseD register
...
4 Generating non-overlapping clocks
Figure 7
...
Assuming that each gate has a unit gate delay, derive
the timing relationship between the input clock and the two output clocks
...
4
...
22Circuitry for generating a
two phase non-overlapping clock
Low-Voltage Static Latches
The scaling of supply voltages is critical for low power operation
...
For example, without
t
the scaling of device thresholds, NMOS only pass transistors (e
...
, Figure 7
...
At very low power supply voltages, the input to the inverter cannot be raised above the switching threshold,
resulting in incorrect evaluation
...
Scaling to low supply voltages hence requires the use of reduced threshold devices
...
When the registers are constantly accessed, the leak-
chapter7
...
However, with the
use of conditional clocks, it is possible that registers are idle for extended periods and the
leakage energy expended by registers can be quite significant
...
One approach for this involves the use of Mu
ltiple Threshold devices as
shown in Figure 7
...
Only the negative latch is shown here
...
The lowthreshold inverters are gated using high threshold devices to eliminate leakage
...
When clock is
low, the D input is sampled and propagates to the output
...
The feedback transmission gate conducts and the cross-coupled feedback is enabled
...
During idle mode, the high threshold devices in series with the low
threshold inverter are turned off (the SLEEP signal is high), eliminating leakage
...
The feedback
low-threshold transmission gate is turnedon and the cross-coupled high-threshold devices
maintains the state of the latch
...
23 One solution for the leakage problem in low-voltage operation using MTCMOS
...
5Transistor minimization in the MTCMOS register
Unlike combination logic, both flavors of high threshold devices in series are required to
eliminate the leakage of low threshold gates
...
Hint: Eliminate
the high VT NMOS or high VT PMOS of the low threshold inverter on the right of Figure
7
...
chapter7
...
5
Dynamic Latches and Registers
289
7
...
This
approach has the useful property that a stored value remains valid as long as the supply
voltage is applied to the circuit, hence the name static
...
When registers are used in computational structures
that are constantly clocked such as pipelined datapath, the requirement that the memory
should hold state for extended periods of time can be significantly relaxed
...
The principle is exactly identical to the one used in dynamic logic — c arge
h
stored on a capacitor can be used to represent a logic signal
...
No capacitor is ideal, unfortunately,
and some charge leakage is always present
...
If one wants to preserve
signal integrity, a periodic refresh of its value is necessary
...
Reading the value of the stored signal from a capacitor without disrupting the charge
requires the availability of a device with a high input impedance
...
5
...
24
...
During this period, the slave
stage is in a hold mode, with node 2 in a high-impedance (floating) state
...
Node 2 now stores the
inverted version of node 1
...
NMOS-only pass transistors, resulting in a even-simpler 6 transistor implementation
...
CLK
D
T1
CLK
1
I1
T2
C1
CLK
CLK
2
I3
Q
C2
Figure 7
...
The set-up time of this circuit is simply the delay of the transmission gateand corre,
sponds to the time it takes node 1 to sample theD input
...
The propagation delay (tc-q) is equal to two inverter delays plus the delay of
the transmission gate T2
...
fm Page 290 Tuesday, April 18, 2000 8:52 PM
290
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
One important consideration for such a dynamic register is that the storage nodes
(i
...
, the state) has to be refreshed at periodic intervals to prevent a loss due to charge leakage, due to diode leakage as well as sub-thresholdcurrents
...
Clock overlap is an important concern for this register
...
25
...
This is known as a race condition
...
The same is true for the 1-1 overlap region, where an
input-output path exists through the PMOS of T1 and the NMOS of T2
...
That is, the data must be stable during
the high-high overlap period
...
Generally
the
built in single inverter delay should be sufficient and the overlap period constraint is given
as:
t overlap 0 – 0 < t T1 + t I1 + t T2
(7
...
5)
(0,0) overlap
CLK
(1,1) overlap
CLK
Figure 7
...
7
...
2
C2MOS Dynamic Register: A Clock Skew Insensitive Approach
The C2MOS Register
Figure 7
...
This circuit is called the CMOS (Clocked
CMOS) register [Suzuki73]
...
1
...
The master
stage is in the evaluation mode
...
Both transistors M7 and M8 are off, decoupling the output
from the input
...
chapter7
...
5
Dynamic Latches and Registers
291
VDD
VDD
M2
M6
M4
CLK
CLK
M8
X
D
M3
CLK
M1
Master Stage
C L1
CLK
M7
Q
CL2
M5
Slave Stage
2
Figure 7
...
2
...
The value stored on CL1
(
propagates to the output node through the slave stage which acts as an inverter
...
However, there is an
important difference:
A C2MOS register with CLK-CLK clocking is insensitive to overlap, as long as the rise and
fall times of the clock edges are sufficiently small
...
25)
...
27a in which both PMOS devices are on during this period
...
This is not desirable since data should not change on the negative edge for a
positive edge-triggered register
...
However,this data cannot
propagate to the output since the NMOS deviceM7 is turned off
...
Therefore, any new data sampled on the falling clock edge is not seen the slave output
at
Q, since the slave state is off till the next rising edge of the clock
...
The (1-1) overlap case (Figure 7
...
T question is again if new data sampled durhe
ing the overlap period (right after clock goes high) propagate to the Q output
...
fm Page 292 Tuesday, April 18, 2000 8:52 PM
292
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
edge
...
However, as soon as the overlap period is over,
the PMOS M8 is turned on and the 0 propagates to output
...
The
VDD
M2
0
VDD
VDD
M6
M4
D
X
M2
M8
0
1
M3
M1
M5
M6
X
Q D
M1
VDD
Q
1
M7
M5
(b) (1-1) overlap
(a) (0-0) overlap
Figure 7
...
No feasible signal path can exist between
In and D, as illustrated by the arrows
...
2
In summary, it can be stated that the CMOS latch is insensitive to clock overlaps
because those overlaps activate either the pull-up or the pull-down networks of the latches,
but never both of them simultaneously
...
This creates a path between input and output that can destroy the state
of the circuit
...
This criterion is not too stringent and is easily met in practical
,
designs
...
28, which plots the
2
simulated transient response of a C MOS D FF for clock slopes of respectively 0
...
For slow clocks, the potential for arace condition exists
...
0
2
...
0
CLK(3)
1
...
1)
X(0
...
1)
1
...
5
Figure 7
...
1 nsec and 3 nsec
clock rise (fall) times assumingIn = 1
...
0
-0
...
fm Page 293 Tuesday, April 18, 2000 8:52 PM
Section 7
...
It is also possibleto design sequential circuits that sample the input on both edges
...
Figure 7
...
It consists of two parallel master-slave based edge-triggered registers, whose outputs are
multiplexed using the tri-state drivers
...
Node Y is held stable, since devices M9 and M10 are turned
off
...
During the low phase, the bottom master latch M1,
(
M4, M9, M10) is turned on, sampling the inverted D input on node Y
...
On the rising edge, the bottom
slave latch conducts, and drives the inverted version of Y on node Q
...
Note that the slave latches operate in a complementary fashion — this is,
only one of them is turned on during each phase of the clock
...
29 C2MOS based dual-edge triggered register
...
6 Dual-edge Registers
Determine how the adoption of dual-edge registers influences the power-dissipation in the
clock-distribution network
...
fm Page 294 Tuesday, April 18, 2000 8:52 PM
294
DESIGNING SEQUENTIAL LOGIC CIRCUITS
7
...
3
Chapter 7
True Single-Phase Clocked Register (TSPCR)
In the two-phase clocking schemes described above, care must be taken in routing the two
2
clock signals to ensure that overlap is minimized
...
TheTrue
Single-Phase Clocked Register (TSPCR) proposed by Yuan and Svensson uses a single
clock (without an inverse clock) [Yuan89]
...
30
...
30
CLK
CLK
Out
Negative Latch
True Single Phase Latches
...
On the other hand, when
CLK = 0, both inverters are disabled, and the latch is in hold-mode
...
As a result of the dual-stage approach, no
signal can ever propagate from the input of the latch to the output in this
mode
...
The clock load is similar to
a conventional transmission gate register or C2MOS register
...
The disadvantage is the slight increase in the number of transistors — 1 transistors are required
...
This reduces the delay overhead associated withthe latches
...
31a outlines the basic approach for embedding logic, while Figure 7
...
While theset-up time of this latch has increased over the one
shown in Figure 7
...
This approach of embedding logic into latches has b
een
used extensively in the design of the EV4 DEC Alpha microprocessor [Dobberpuhl92]
and many other high performance processors
...
4 Impact of embedding logic into latches on performance
Consider embedding an AND gate into the TSPC latch as shown in Figure 7
...
In a 0
...
A conventional
approach, composed of an AND gate followed by a positive latch has an effectiveset-up time
of 600 psec (we treat the AND plus latch as a black box that perform the AND+latching
s
chapter7
...
5
Dynamic Latches and Registers
VDD
295
VDD
VDD
In1
VDD
In2
Q
PUN
Q
In
CLK
CLK
CLK
CLK
In1
PDN
In2
(b) AND latch
(a) Including logic into the latch
Figure 7
...
functions)
...
The TSPC latch circuits can be further reduced in complexity as illustrated in Figure
7
...
Besides the reduced number
of transistors, these circuits have the advantage that the clock load is reduced by half
...
For
instance, the voltage at node A (for Vin = 0 V) for the positive latch maximally equalsVDD
– V Tn, which results in a reduced drive for the output NMOS transistor and a loss in performance
...
This also limits the amount ofVDD scaling possible on the latch
...
32 Simplified TSPC latch (also called split-output)
...
33 shows the design of a specialized single-phase edge-triggered register
...
The second
(dynamic) inverter is in the precharge mode, with M6 charging up node Y to VDD
...
Therefore, during the low phase of
the clock, the input to the final(static) inverter is holding its previous value and the output
Q is stable
...
If X is
high on the rising edge, nodeY discharges
...
On the positive phase of the
clock, note that node X transitions to a low if theD input transitions to a high level
...
This represents the hold time of the register (note that the hold time
less than 1 inverter delay since it takes 1 delay for the input to affect node The propaX)
...
fm Page 296 Tuesday, April 18, 2000 8:52 PM
296
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
gation delay of the register is essentially three inverters since the value on nodeX must
propagate to the output Q
...
VDD
M3
D
VDD
CLK
M6
M9
Y
CLK
X
M2
M1
VDD
CLK
Q
CLK
M5
Q
M4
M8
M7
Figure 7
...
WARNING: Similar to the C2MOS latch, the TSPC latch malfunctions when theslope of
the clock is not sufficiently steep
...
The clock slopes should therefore be carefully controlled
...
Example 7
...
With
improper sizing, glitches may occur at the output due to a race condition when the clock transitions from low to high
...
While CLK is
low, Y is pre-charged high turning on M7
...
Once Y is
sufficiently low, the trend onQ is reversed and the node is pulled high anew through M9
...
Figure 7
...
34 for different sizes
of devices in the final two stages
...
0
Qmodified Qoriginal
Volts
2
...
5µm
2µm
Modified
Width
CLK
M7, M8
1µm
1µm
1
...
0
0
...
2
0
...
6
time (nsec)
0
...
0
Figure 7
...
33)
...
It
also reduces the contamination delay of the register
...
fm Page 297 Tuesday, April 18, 2000 8:52 PM
Section 7
...
This is accomplished by reducing he strength of the M7-M8
t
pulldown path, and by speeding up the M4 -M5 pulldown path
...
6 Pulse Registers
Until now, we have used the master-slave configuration to create an edge-triggered register
...
The
idea is to construct a short pulse around the rising or falling) edge of the clock
...
g
...
35a), sampling
the input only in a short window
...
e, the transparent period) of the latch very short
...
Figure 7
...
When CLK = 0, node X is charged up to VDD (MN
is off since CLKG is low)
...
This in turn acti,
vates MN, pulling X and eventually CLKG low (Figure 7
...
The length of the pulse is
controlled by the delay of the AND gate and the two inverters
...
If every register on the chip
uses the same clock generation mechanism,this sampling delay does not matter
...
This must be taken into account when performing timing verification and clock skew analysis (which is the topic of a later Chapter)
...
35 Glitch latch - timing generation and register
...
fm Page 298 Tuesday, April 18, 2000 8:52 PM
298
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
If set-up time and hold time are measured in reference to the rising edge of the glitch
clock, the set-up time is essentially zero, the hold time is equal to the length of the pulse (if
the contamination delay is zero for the gates), and the propagation delay (tc-q) equals two
gate delays
...
The glitch-generation circuitry can be amortized over multiple register bits
...
This has
prevented a wide-spread use
...
g
...
Another version of the pulsed registeris shown in Figure 7
...
When the clock is low, M3 and M6 are off and device P1 is
turned on
...
CLKD is a delay-inverted version of CLK
...
During this interval, the circuit is
transparent
and the input data D is sampled by the latch
...
On
the falling edge of the clock, node X is held at VDD and the output is held stable by the
cross-coupled inverters
...
36 Flow-through positive
edge-triggered register
...
The transparency period also determines the hold time of the register
...
In this particular circuit, the set-up time can be negative
...
This is atractive, as data can arrive at the register even
t
after the clock goes high, which means that time is borrowed from the previous cycle
...
6 Set-up time of glitch register
The glitch register of Figure 7
...
As a result, the input data can actually change after the rising edge of the clock, resulting in a
negative set-up time (Figure 7
...
The D-input transitions to low after the rising edge of the
clock, and transitions high before the falling edge ofCLKD (this is, during the transparency
period)
...
The output Q does go to the correct value
of VDD as long as the input D is set up correctly some time before the falling edge ofCLKD
...
That is, the output can have multiple transitions around the rising
edge, and therefore, the output of the register should not be used as a clock to other registers
...
fm Page 299 Tuesday, April 18, 2000 8:52 PM
Section 7
...
0
2
...
0
1
...
0
0
...
37 Simulation showing a
negative set-up time for the glitch
register
...
0
-0
...
0
0
...
4
0
...
8
1
...
7 Converting a glitch register to a conditional glitch register
Modify the circuit in Figure 7
...
The goal is to
convert the register to a conditional register which latches only when the enable signal is
asserted
...
7 Sense-Amplifier Based Registers
So far, we have presented two fundamental approaches towards building edge-triggered
registers: the master-slave concept and the glitch technique
...
38 introduces
another technique that uses a sense amplifier structure to implement an edge-triggered
register [Montanaro96]
...
As we will see, sense amplifier circuits are used
extensively in memory cores and in low swing bus drivers to amplify small voltage swings
OUT
OUT
VDD
VDD
M9
M7
M8
M10
L2
L1
M5
M6
M4
IN
CLK
L3
M2
VDD
M1
L4
M3
IN
Figure 7
...
chapter7
...
There are many techniques to construct these amplifiers,
with the use of feedback (e
...
, cross-coupled inverters) being one common approach
...
38 uses a precharged front-end amplifier that samples the differential input signal on the rising edge of the clock signal
...
The differential inputs in this implementation
don’ have to have rail-to-rail swing and hence this register can be used as a receiver for a
t
reduced swing differential bus
...
As a result, PMOS transistors M7 and M8 to be turned off and the NAND FF is
holding its previous state
...
On the rising edge of the clock, the evaluate transistor turns
on
and the differential input pair ( 2 and M3) is enabled, and the difference between the input
M
signals is amplified on the output nodes onL1 and L2
...
For example, if is 1, L1 is
IN
pulled to 0, and L2 remains at VDD
...
Initially
0
L1
L2
high
impedance
L3
w/o shorting
device
0
IN
1 0
1
1 0 L4
w/shorting
device
IN
0 1
Inputs Change (CLK still high)
0 1
L1
0 0
L2
leakage path
L3
IN
1 0
0 1
1 0
0 L4
IN
0 1
L1
leakage path
L3
IN
1 0
L2
0 0
1 1
1 0 L4
IN
0 1
The leakage current attempts to
L3 is isolated so charge accumucharge L1/L3 but the DC path
lates until L1/L3 change state,
through the shorting transistor
causing L2 to change state as well
...
As a result the flip-flop outputs change
...
39 The need for the shorting transistorM4
...
This is necessary to accommodate the case where the inputs change
their value after the positive edge ofCLK has occurred, resulting in either L3 or L4 being
left in a high-impedance state with a logical low voltage level stored on the node
...
chapter7
...
8
Pipelining: An approach to optimize sequential circuits
301
could then actually change state prior to the next rising edge ofCLK! This is best illustrated graphically, as shown in Figure 7
...
7
...
The idea is easily explained with the example of Figure 7
...
The goal of the presented circuit is to computelog(|a − b|), where both a and b represent
streams of numbers, that is, the computation must be performed on a large set of input values
...
6)
...
We assume that the registers are edge-triggered D registers
...
In conventional systems (that don’ push
t
the edge of technology), the latter delay is generally much larger than the delays associated with the registers and dominates the circuit performance
...
We note that each logic module is then active for
only 1/3 of the clock period (if the delay of the register is ignored)
...
Pipelining is a technique to improve the
resource utilization, and increase the functional throughput
...
40b
...
1
...
REG
CLK
REG
a
REG
CLK
Out
CLK
(b) Pipelined version
CLK
Figure 7
...
a
chapter7
...
At
that time, the circuit has already performed parts of the computations for the next data
sets, (a2, b2) and (a3,b3)
...
Table 7
...
Clock Period
Adder
Absolute Value
Logarithm
1
a1 + b1
2
a2 + b2
|a1 + b1|
3
a3 + b3
|a2 + b2|
log(|a1 + b1|)
4
a4 + b4
|a3 + b3|
log(|a2 + b2|)
5
a5 + b5
|a4 + b4|
log(|a3 + b3|)
The advantage of pipelined operation becomes apparent when examining the minimum clock period of the modified circuit
...
This effectively reduces the value of the minimum allowable clock period:
T min ,pipe = t c-q + max (t pd ,add,t pd ,abs, t pd ,log)
(7
...
The pipelined network
outperforms the original circuit by a factor of three under these assumptions, or min,pipe=
T
Tmin/3
...
This explains why pipelining is popular in the implementation of very high
-performance datapaths
...
8
...
Register-Based Pipelines
Pipelined circuits can be constructed using level-sensitive latches instead of edge-triggered registers
...
41
...
That is, logic is introduced between the master and slave latches of a
master-slave system
...
Latch-based systems give significantly more flexibility in implementing a pipelined system, and often offers higher performance
...
Input data is sampled on C1 at the negative edge of CLK and the computation of
logic block F starts; the result of the logic block F is stored on C2 on the falling edge of
CLK, and the computation of logic block G starts
...
For the example at hand, pipelining increases the latency from 1 to 3
...
chapter7
...
8
Pipelining: An approach to optimize sequential circuits
CLK
CLK
303
CLK
Out
In
F
G
C1
C2
C3
CLK
CLK
Compute F
compute G
Figure 7
...
ensures correct operation
...
When overlap exists between CLK and CLK, the next input is already
being applied to F, and its effect might propagate to C2 before CLK goes low (assuming
that the contamination delay of F is small)
...
Which value wins depends upon the logic function
F,
the overlap time, and the value of the inputs since thepropagation delay is often a function of the applied inputs
...
7
...
2
NORA-CMOS— A Logic Style for Pipelined Structures
2
The latch-based pipeline circuit can also be implemented using C MOS latches, as shown
in Figure 7
...
The operation is similar to the one discussed above
...
The reasoning for the above argument is similar to the argument made in the construction of a C2MOS register
...
27)
...
43, where F is replaced by a single, static CMOS inverter
...
Based on this concept, a logic circuit style called NORA-CMOS was conceived
[Goncalves83]
...
Each module consists of a block of combinational logic that can be a mixture of
2
static and dynamic logic, followed by a CMOS latch
...
A block
chapter7
...
42
VDD
C2
CLK
C3
2
Pipelined datapath using C MOS latches
...
43 Potential race condition
2
during (0-0) overlap in C MOS-based
design
...
Examples of both classes are shown in Figure 7
...
The operation modes of the modules are summarized in Table 7
...
Table 7
...
CLK block
CLK block
Logic
CLK = 0
CLK = 1
Latch
Logic
Latch
Precharge
Hold
Evaluate
Evaluate
Evaluate
Evaluate
Precharge
Hold
A NORA datapath consists of a chain of alternatingCLK and CLK modules
...
Data is passed in a pipelined fashion from
module to module
...
Dynamic and static logic
can be mixed freely, and bothCLKp and CLKn dynamic blocks can be used in cascaded or
in pipelined form
...
Design Rules
In order to ensure correct operation, two important rules should always be followed:
chapter7
...
8
Pipelining: An approach to optimize sequential circuits
VDD
CLK
VDD
305
VDD
CLK
CLK
In1
In2
In3
Out
PUN
PDN
CLK
CLK
CLK
Combinational logic
Latch
(a) CLK-module
VDD
CLK
VDD
VDD
VDD
In4
CLK
Out
In1
In2
In3
PDN
CLK
CLK
In4
(b) CLK-module
Figure 7
...
• The dynamic-logic rule: Inputs to a dynamic CLKn (CLKp) block are only allowed to
make a single 0 → 1 (1 → 0) transition during the evaluation period (Chapter6)
...
The presence of dynamic logic circuits requires the introduction of some extensions
to the latter rule
...
45a
...
Assume now that a (0-0) overlap occurs
...
45b)
...
This translates into the following
rule: The number of static inversions between the last dynamic block in a logic function
and the C2MOS latch should be even
...
chapter7
...
45
(b) Same circuit during (0-0) clock overlap
...
Revised C2MOS Rule
• The number of static inversions betweenC2 MOS latches should be even (in the absence
of dynamic nodes); if dynamic nodes are present, the number of static inverters between
a latch and a dynamic gate in the logic block should be even
...
Adhering to the above rules is not always trivial and requires a careful analysis of
the logic equations to be implemented
...
Its use should only be considered when maximum
circuit performance is a must
...
9 Non-Bistable Sequential Circuits
In the preceding sections, we have focusedon one single type of s
equential element, this is
the latch (and its sibling the register)
...
The bistable element is not the only
sequential circuit of interest
...
The former act as oscillators and can, for instance, be used for on-chip clock
generation
...
Another
interesting regenerative circuit is the Schmitt trigger
...
This peculiar
feature can come in handy in noisy environments
...
fm Page 307 Tuesday, April 18, 2000 8:52 PM
Section 7
...
9
...
It responds to a slowly changing input waveform with afast transition time at the
output
...
The voltage-transfer characteristic of the device displaysdifferent switching thresholds for positive- and negative-going input signals
...
46, where a typical voltage-transfer characteristic of the Schmitt trigger is shown
(and its schematics symbol)
...
The hysteresis voltage is
defined as the difference between the two
...
46
Non-inverting Schmitt trigger
...
This is illustrated in Figure 7
...
Notice how the
hysteresis suppresses the ringing on the signal
...
For instance, steep signal
slopes are beneficial in reducing power consumptionby suppressing direct-path currents
...
Vin
Vout
VM+
VM–
t0
Figure 7
...
t0 + tp
t
chapter7
...
48 The
...
Increasing the ratio results
in a reduction of the threshold, while decreasing it results in an increase in M
...
This adaptation is achieved with the aid of feedback
...
48 CMOS Schmitt trigger
...
The feedback loop
biases the PMOS transistor M4 in the conductive mode while M3 is off
...
This
M
modifies the effective transistor ratio of the inverter tokM1/(kM2+kM4), which moves the
switching threshold upwards
...
This extra pull-down device speeds up the transition and produces a clean
output signal with steep slopes
...
In this case, the
pull-down network originally consists ofM1 and M3 in parallel, while the pull-up network
is formed by M2
...
Example 7
...
Device M1 and M2 are
s
1µm/0
...
25µm, respectively
...
25 V)
...
49a shows the simulation of theSchmitt trigger assuming that devices M 3 and M4 are 0
...
25µm and 1
...
25µm, respectively
...
The high
-to-low switching point (VM- =
0
...
6 V) is larger
than VDD/2
...
For
example, to modify the low-to-high transition, we need to vary the PMOS device
...
5 µm
...
5µm
...
49b demonstrates how the switching threshold
increases with raising values ofk
...
fm Page 309 Tuesday, April 18, 2000 8:52 PM
Section 7
...
5
2
...
0
2
...
5
Vx (V)
VX (V)
1
...
0
1
...
5
0
...
0
0
...
5
1
...
5
2
...
5
0
...
0
V in(V)
k=4
k=2
0
...
0
1
...
0
2
...
The width is k * 0
...
Figure 7
...
(a) Voltage-transfer characteristics with hysteresis
...
8 An Alternative CMOS Schmitt Trigger
Another CMOS Schmitt trigger is shown in Figure 7
...
Discuss the operation of the gate,
and derive expressions for VM− and VM+
...
50 Alternate CMOS Schmitt trigger
...
9
...
It is called
monostable
because it has only one stable state (the quiescent one)
...
This means that it eventually returns to its original state after a time period determined by the circuit parameters
...
This functionality is required in a wide range of applications
...
Another
chapter7
...
This circuit detects a change in a signal, or group of signals,
such as the address or data bus, and produces a pulse to initialize the subsequent circuitry
...
The concept is illustrated in Figure
7
...
In the quiescent state, both inputs to the XOR are identical, and the output is low
...
After a delay td (of the delay element), this disruption is removed, and the output
goes low again
...
The delay circuit can be realized in many
different ways, such as anRC-network or a chain of basic gates
...
51
7
...
3
Out
td
Transition-triggered one-shot
...
The output oscillates back and forth between two
quasi-stable states with a period determined by the circuit topology and parameters (delay,
power supply, etc
...
This application is discussed in detail in a later chapter (on timing)
...
It consists of an odd
number of inverters connected in a circular chain
...
Example 7
...
52 (all gates
use minimum-size devices)
...
5 nsec,
which corresponds to a gate propagation delay of 50 psec
...
0
v1 v3 v5
2
...
0
1
...
0
0
...
0
-0
...
0
0
...
0
1
...
52 Simulated
waveforms of five-stage ring
oscillator
...
chapter7
...
9
Non-Bistable Sequential Circuits
311
played in the plot)
...
The ring oscillator composed of cascaded inverter produces a waveform with a
s
fixed oscillating frequency determined by the delay of an inverter in the CMOS process
...
An example
of such a circuit is the voltage-controlled oscillator (VCO), whose oscillation frequency is
a function (typically non-linear) of a control voltage
...
53 [Jeong87]
...
VDD
M2
In
Out
M1
Iref
Vcntl
M3
Figure 7
...
In this modified inverter circuit, the maximal discharge current of the inverter is limited by adding an extra series device Note that the low-to-high transition on the inverter
...
The added NMOS
transistor M3, is controlled by an analog control voltageVcntl, which determines the available discharge current
...
The ability to alter thepropagation delay per stage allows us to control the frequency
of the ring structure
...
Under
low operating current levels, the current-starved inverter suffers from slow fall times at its
output
...
This is resolved by feeding its
output into a CMOS inverter or better yet a Schmitt trigger
...
Example 7
...
54 show the simulated delay of the current-starved inverter as a function of the control voltage Vcntl
...
When the control
voltage is smaller than the threshold, the device enters the sub-threshold region
...
When operating in this region, t e delay is very sensitive to variations in
h
the control voltage, and, hence, to noise
...
fm Page 312 Tuesday, April 18, 2000 8:52 PM
312
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
5
...
0
3
...
0
Figure 7
...
1
...
0
0
...
0
1
...
0
2
...
55a
...
Figure 7
...
The simulated waveforms o this two stage VCO are shown in Figure 7
...
The inf
phase and quadrature phase outputs are available simultaneously The differential type
...
However, it consumes more power due to the increased complexity, and the static current
...
0
2
...
0
1
...
0
0
...
0
-0
...
5
1
...
5
time (nsec)
(c) simulated waveforms of 2-stageVCO
3
...
55 Differential delay
element and VCO topology
...
fm Page 313 Tuesday, April 18, 2000 8:52 PM
Section 7
...
10
Perspective: Choosing a Clocking Strategy
313
Perspective: Choosing a Clocking Strategy
A crucial decision that must be made in the earliest phases of a chip design is select the
to
appropriate clocking methodology
...
Choosing the right clocking scheme affects the functionality,
speed and power of a circuit
...
The
most robust and conceptually simple scheme is the two-phase master-slave design
...
More exotic schemes
such as the glitch register are also used in practice
...
An example of such is the n for a negative seteed
up time to cope with clock skew
...
Most automated design methodologies such as standard cell employ a single-phase, edge-triggered approach, based on
static flip-flops
...
The use of latches between logic is
also very common to improve circuit performance
...
11 Summary
This chapter has explored the subject of sequential digital circuits
...
A
third potential operation point turns out to be metastable; that is, any diversion from
this bias point causes the flip-flop to converge to one of the stable states
...
A register (sometime also called aflip-flop) on the
other hand samples the data on the rising or falling edge
...
These
parameters must be carefully optimized since they may account for a significant
portion of the clock period
...
A static register holds state as long as the power
supply is turned on
...
g
...
Dynamic memory is based on temporary
charge store on capacitors
...
However, charge on a dynamic node leaks away
with time, and hence dynamic circuits have a minimum clock frequency
...
w
The most common and widely used approach is the master-slave configuration
which involves cascading a positive latch and negative latch (or vice-versa)
...
fm Page 314 Tuesday, April 18, 2000 8:52 PM
314
DESIGNING SEQUENTIAL LOGIC CIRCUITS
Chapter 7
• Registers can also be constructed using the pulse or glitch concept
...
Generally, the design of such circuits requires careful timing analysis across all process
corners
...
• Choice of clocking style is an important consideration
...
Circuit techniques such as C MOS can be used to eliminate race
conditions in two-phase clocking
...
However, the rise time of clocks must be carefully optimized to eliminate races
...
An example of such an approach, the NORA logic style, is
very effective in pipelined datapaths
...
They are useful as pulse generators
...
The ring oscillator is
the best-known example of a circuit of this class
...
They are mainly used to suppress noise
...
12 To Probe Further
The basic concepts of sequential gates can be found in many logic design textbooks (e
...
,
[Mano82] and [Hill74])
...
References
[Dopperpuhl92] D
...
, “A 200 MHz 64-b Dual Issue CMOS Microprocessor,” IEEE
JSSC, vol
...
11, Nov
...
1555–1567
...
Bieseke et al
...
176-177, Feb
...
[Goncalves83] N
...
De Man, “NORA: a racefree dynamic CMOS technique for
pipelined logic structures,” IEEE JSSC, vol
...
3, June 1983, pp
...
[Haznedar91] H
...
[Hill74] F
...
Peterson, Introduction to Switching Theory and Logical Design, Wiley, 1974
...
Hodges and H
...
[Jeong87] D
...
, “Design of PLL-based clock generation circuits,”IEEE JSSC, vol
...
2, April 1987, pp
...
[Kuzo96] S
...
, “A 100MHz 0
...
140-141, February 1996
...
Mano, Computer System Architecture, Prentice Hall, 1982
...
Montanaro et al
...
5-W CMOS RISC Microprocessor,” IEEE
JSSC, pp
...
chapter7
...
13
Exercises and Design Problems
315
[Mutoh95] S
...
, “1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS,” IEEE JSSC , pp
...
[Partovi96] H
...
138-139, February 1996
...
H
...
15, January 1938, pp
...
[Shoji88] M
...
[Suzuki73] Y
...
Odagawa, and T
...
SC-8, December 1973, pp
...
[Veendrick92] H
...
[Yuan89] J
...
, “High-Speed CMOS Circuit Technique,” IEEE JSSC, vol
...
1, February 1989, pp
...
7
...
reset, Tom’ latch
s
- static in one phase, look in Hamid’ paper,
s
1
...
2] The indicated waveforms are applied to the JK master-slave flip-flop of Figure
7
...
For this problem, assume that gate delays are short compared to the input signal time scale
...
Sketch the waveforms that appear at the QM and QS outputs of the master and slave
latches
...
b
...
Figure 7
...
[M&D, SPICE, 6
...
14 using static CMOS minimum-size
devices
...
a
...
Find
these gate delays using SPICE under the appropriate loading conditions
...
b
...
Find the
new set-up and hold times and thepropagation delay s using SPICE
...
fm Page 42 Tuesday, April 16, 2002 9:12 AM
CHAPTER
10
TIMING ISSUES
IN DIGITAL CIRCUITS
Impact of clock skew and jitter on performance and functionality
n
Alternative timing methodologies
n
Synchronization issues in digital IC and board design
n
Clock generation
10
...
5 Synchronizers and Arbiters*
10
...
6 Clock Synthesis and Synchronization Using a
Phase-Locked Loop
10
...
3
...
3
...
3
...
7 Future Directions
10
...
9 Summary
10
...
3
...
4 Self-Timed Circuit Design*
42
chapter10_141
...
1
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
Introduction
All sequential circuits have one property in common—a well-defined ordering of the
switching events must be imposed if the circuit is to operate correctly
...
The synchronous system approach, in which all memory elements in the system are
simultaneously updated using a globally distributed periodic synchronization signal (that
is, a global clock signal), represents an effective and popular way to enforce this ordering
...
This Chapter starts with an overview of the different timing methodologies
...
We analyze the
impact of spatial variations of the clock signal, called clock skew, and temporal variations
of the clock signal, called clock jitter, and introduce techniques to cope with it
...
At the other end of the design spectrum is an approach called asynchronous design,
which avoids the problem of clock uncertainty all-together by eliminating the need for
globally-distributed clocks
...
The important issue of synchronization, which is required when interfacing different clock domains
or when sampling an asynchronous signal, also deserves some in-depth treatment
...
10
...
Signals that transition only at predetermined periods
in time can be classified as synchronous, mesochronous, or plesiochronous with respect to
a system clock
...
10
...
1
Synchronous Interconnect
A synchronous signal is one that has the exact same frequency, and a known fixed phase
offset with respect to the local clock
...
In
digital logic design, synchronous systems are the most straightforward type of interconnect, where the flow of data in a circuit proceeds in lockstep with the system clock as
shown below
...
After a suitable setting period, the output Cout becomes valid and can be sampled by
chapter10_141
...
2
Classification of Digital Systems
44
CLK
In
R1
Cin
Figure 10
...
Combinational
Logic
R2
Cout
Out
R2 which synchronizes the output with the clock
...
The length of the “uncertainty period,” or the period where data is not valid, places an upper bound on how fast a
synchronous interconnect system can be clocked
...
2
...
For example, if data is
being passed between two different clock domains, then the data signal transmitted from
the first module can have an unknown phase relationship to the clock of the receiving
module
...
A (mesochronous) synchronizer can
be used to synchronize the data signal with the receiving clock as shown below
...
Block A
R1
D1
Interconnect
Delay
D3
D2
R2
Block B
D4
ClkB
ClkA
PD/
Control
Figure 10
...
In Figure 10
...
However, D1 and D2
are mesochronous with ClkB because of the unknown phase difference between ClkA and
ClkB and the unknown interconnect delay in the path between Block A and Block B
...
In this example, the variable delay element is adjusted by measuring the phase difference between the
received signal and the local clock
...
10
...
3
Plesiochronous Interconnect
A plesiochronous signal is one that has nominally the same, but slightly different frequency as the local clock (“plesio” from Greek is near)
...
fm Page 45 Tuesday, April 16, 2002 9:12 AM
45
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
drifts in time
...
Since the transmitted signal can
arrive at the receiving module at a different rate than the local clock, one needs to utilize a
buffering scheme to ensure all data is received
...
A possible
framework for plesiochronous interconnect is shown in Figure 10
...
Clock C2
Timing
Recovery
Clock C 1
C3
Originating
Module
Receiving
Module
FIFO
Figure 10
...
In this digital communications framework, the originating module issues data at
some unknown rate characterized by C1, which is plesiochronous with respect to C2
...
As a result, C3 will be synchronous with the data at the input of
the FIFO and will be mesochronous with C1
...
However, by making the FIFO large enough, and periodically resetting the system whenever an overflow condition occurs, robust communication can be
achieved
...
2
...
As a result, it is not straightforward to map these arbitrary transitions into a synchronized data stream
...
In such an approach, communication between modules is controlled through a handshaking protocol to perform the
proper ordering of commands
...
4 Asynchronous design methodology for simple pipeline interconnect
...
fm Page 46 Tuesday, April 16, 2002 9:12 AM
Section 10
...
The handshaking signals then initiate a data transfer to
the next block, which latches in the new data and begins a new computation by asserting
the initialization signal I
...
There is no need to manage clock skew, and the design methodology leads to a very modular approach where interaction between blocks simply occur
through a handshaking procedure
...
10
...
3
...
The
generation and distribution of a clock has a significant impact on performance and power
dissipation
...
In the ideal world, assuming the
clock paths from a central distribution point to each register are perfectly balanced, the
phase of the clock (i
...
, the position of the clock edge relative to a reference) at various
points in the system is going to be exactly equal
...
This results in performance degradation and/or circuit
malfunction
...
5 shows the basic structure of a synchronous pipelined datapath
...
5 Pipelined Datapath Circuit and timing parameters
...
The following timing parameters characterize the timing of the
sequential circuit
...
• The set-up (tsu) and hold time (thold) for the registers
...
chapter10_141
...
Under ideal conditions (tclk1 = tclk2), the worst case propagation delays determine the
minimum clock period required for this sequential circuit
...
This constraint is given by (as
derived in Chapter 7):
T > t c – q + t log ic + t su
(10
...
2)
The above analysis is simplistic since the clock is never ideal
...
Clock Skew
The spatial variation in arrival time of a clock transition on an integrated circuit is commonly referred to as clock skew
...
Consider the transfer of data between registers R1 and R2 in Figure
10
...
The clock skew can be positive or negative depending upon the routing direction and
position of the clock source
...
6
...
TCLK + δ
δ
4
2
1
CLK2
TCLK
3
CLK1
δ + th
Figure 10
...
In
this sample timing diagram, δ > 0
...
That is, if in one cycle CLK2 lagged CLK1 by
δ, then on the next cycle it will lag it by the same amount
...
chapter10_141
...
3
Synchronous Design — An In-depth Perspective
48
TCLK + δ
TCLK
1
4
2
CLK2
3
CLK1
δ
Figure 10
...
The rising edge of CLK2 arrives
earlier than the edge of CLK1
...
First consider the impact of clock skew on performance
...
6, a new
input In sampled by R1 at edge
will propagate through the combinational logic and be
sampled by R2 on edge
...
The output of the combinational logic
must be valid one set-up time before the rising edge of CLK2 (point )
...
3)
The above equation suggests that clock skew actually has the potential to improve
the performance of the circuit
...
As above, assume that input In is sampled on the rising edge of CLK1 at edge into
R1
...
However, if the minimum delay of the combinational logic block is small, the inputs to R2 may change before the clock edge , resulting
in incorrect evaluation
...
The constraint can be formally stated as
1
4
2
2
δ + t hol d < t ( c – q, cd ) + t ( log ic, cd )
or
δ < t ( c – q, cd ) + t ( log ic, cd ) – t hold
(10
...
7 shows the timing diagram for the case when δ < 0
...
On the rising edge of CLK1, a
new input is sampled by R1
...
As
can be seen from Figure 10
...
(10
...
However, a negative skew implies that the system never fails,
since edge
happens before edge ! This can also be seen from Eq
...
4), which is
always satisfied since δ < 0
...
fm Page 49 Tuesday, April 16, 2002 9:12 AM
49
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
Example scenarios for positive and negative clock skew are shown in Figure 10
...
In
R1
Combinational
Logic
Q
D
R2
tCLK1
CLK
Combinational
Logic
Q
D
R3
D
tCLK2
Q
tCLK3
delay
delay
(a) Positive skew
In
R1
Combinational
Logic
Q
D
R2
D
Combinational
Logic
Q
tCLK1
R3
D
Q
tCLK3
tCLK2
delay
CLK
delay
(b) Negative skew
Figure 10
...
• δ > 0—This corresponds to a clock routed in the same direction as the flow of the data
through the pipeline (Figure 10
...
In this case, the skew has to be strictly controlled
and satisfy Eq
...
4)
...
Reducing the clock frequency of an edge-triggered circuit
does not help get around skew problems! On the other hand, positive skew increases the
throughput of the circuit as expressed by Eq
...
3), because the clock period can be
shortened by δ
...
(10
...
• δ < 0—When the clock is routed in the opposite direction of the data (Figure 10
...
4) is unconditionally met
...
The skew reduces the time available for actual computation so that the clock period has to be increased by |δ|
...
Unfortunately, since a general logic circuit can have data flowing in both directions (for
example, circuits with feedback), this solution to eliminate races will not always work (Figure
10
...
The skew can assume both positive and negative values depending on the direction of
In
REG
Logic
Logic
CLK
Positive skew
CLK
Clock distribution
Figure 10
...
REG
CLK
REG
REG
Negative skew
CLK
Logic
Out
chapter10_141
...
3
Synchronous Design — An In-depth Perspective
50
the data transfer
...
In general, routing the clock so that only negative skew occurs is not feasible
...
Example 10
...
10
...
The maximum
and minimum delays of the gates is made, as they are assumed to be identical
...
On the
other hand, computation of the worst case propagation delay is not as simple as it appears
...
However, when analyzing the data dependencies, it becomes obvious that path
is never
exercised
...
If A = 1, the critical path goes through OR1 and OR2
...
For the case when A= 0 and B =1, the critical path is through I1,OR1, AND 3 and OR2
...
Therefore, the propagation delay is 4 tgate
...
1
1
OR1
OR2
I1
AND1
path
AND2
1
C
2
B
1
path
A
AND 3
D
Figure 10
...
WARNING: The computation of the worst-case propagation delay for combinational
logic, due to the existence of false paths, cannot be obtained by simply adding the propagation delay of individual logic gates
...
Clock Jitter
Clock jitter refers to the temporal variation of the clock period at a given point — that is,
the clock period can reduce or expand on a cycle-by-cycle basis
...
Jitter can be measured and cited in one of many ways
...
fm Page 51 Tuesday, April 16, 2002 9:12 AM
51
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
of a single clock period and for a given spatial location i is given as Tjitter,i(n) = Ti, n+1 - Ti,n
- TCLK, where Ti,n is the clock period for period n, Ti, n+1 is clock period for period n+1, and
TCLK is the nominal clock period
...
Figure 10
...
Ideally the clock period starts at
edge and ends at edge and with a nominal clock period of TCLK
...
As a result, the total time available to complete the operation is reduced by 2 tjiiter in the
worst case and is given by
5
2
4
3
TCLK – 2t j itt er ≥ t c – q + t log i c + t su or T ≥ t c – q + t log ic + t su + 2t j itt er
(10
...
Care must be taken to reduce jitter in the clock network to maximize performance
...
11 Circuit for studying the impact of jitter on performance
...
Consider the sequential circuit show in Figure 10
...
Assume that nominally ideal clocks are distributed to both registers (the clock
period is identical every cycle and the skew is 0)
...
Assume that CLK1 has a jitter of tjitter1 and
CLK2 has a jitter of tjitter2
...
The worst
case happen when the leading edge of the current clock period on CLK1 happens late
(edge ) and the leading edge of the next cycle of CLK2 happens early (edge )
...
6)
chapter10_141
...
3
Synchronous Design — An In-depth Perspective
52
R1
In
R2
Combinational
Logic
Q
D
D
tCLK1
tCLK2
5 2
TCLK
3 1
tjitter2
CLK2
11
9 7
δ
8
TCLK + δ
tjitter1
CLK1
Q
6 4
12
Figure 10
...
In this example, a positive skew (δ) is assumed
...
To formulate
the minimum delay constraint, consider the case when the leading edge of the CLK1 cycle
arrives early (edge ) and the leading edge the current cycle of CLK2 arrives late (edge
)
...
This results in
6
1
1
6
δ + t hold + t ji tter1 + t ji tte r2 < t ( c – q, c d) + t ( log i c, cd )
or
δ < t ( c – q, cd ) + t ( log ic, cd ) – t hold – t j itt er1 – t jit ter2
(10
...
Now consider the case when the skew is negative (δ <0) as shown in Figure 10
...
For the timing shown, |δ| > tjitter2
...
That is, negative
skew reduces performance
...
13 Consider a negative clock skew (δ) and the skew is assumed to be larger than the jitter
...
fm Page 53 Tuesday, April 16, 2002 9:12 AM
53
TIMING ISSUES IN DIGITAL CIRCUITS
10
...
2
Chapter 10
Sources of Skew and Jitter
A perfect clock is defined as perfectly periodic signal that is simultaneous triggered at various memory elements on the chip
...
To illustrate the sources of skew and jitter, consider the
simplistic view of clock generation and distribution as shown in Figure 10
...
Typically, a
high frequency clock is either provided from off chip or generated on-chip
...
s registers
...
The clock paths include wiring and
the associated distributed buffers required to drive interconnects and loads
...
e
...
Temperature
Capacitive Load
7
Interconnect
6
5
2
4
Clock Generation
3
Devices
Power Supply
Coupling to Adjacent Lines
1
Figure 10
...
The are many reasons why the two parallel paths don’t result in exactly the same
delay
...
First, errors can
be divided into systematic or random
...
g
...
In principle, such errors can be modeled and corrected at design time given sufficiently good models and simulators
...
Random errors
are due to manufacturing variations (e
...
, dopant fluctuations that result in threshold variations) that are difficult to model and eliminate
...
In practice, there is a continuum between changes that are slower
than the time constant of interest, and those that are faster
...
A clock network tuned by a one-time
calibration or trimming would be vulnerable to time-varying mismatch due to varying
thermal gradients
...
For example, the clock net is usually
by far the largest single net on the chip, and simultaneous transitions on the clock drivers
induces noise on the power supply
...
fm Page 54 Tuesday, April 16, 2002 9:12 AM
Section 10
...
Of course, this power supply glitch may still cause static mismatch if it is not the same throughout the chip
...
14, are described in detail
...
A typical on-chip clock generator, as
described at the end of this chapter, takes a low-frequency reference clock signal, and produces a high-frequency global reference for the processor
...
This is an analog circuit, sensitive to intrinsic
device noise and power supply variations
...
This is particularly a problem in
modern fabrication processes that combine a lightly-doped epitaxial layer and a heavilydoped substrate (to combat latch-up)
...
These noise source cause temporal variations of the clock signal that
propagate unfiltered through the clock drivers to the flip-flops, and result in cycle-to-cycle
clock-period variations
...
Manufacturing Device Variations (2)
Distributed buffers are integral components of the clock distribution networks, as they are
required to drive both the register loads as well as the global and local interconnects
...
Unfortunately, as a result of process variations, devices parameters in the
buffers vary along different paths, resulting in static skew
...
The doping variations can affect the
depth of junction and dopant profiles and cause variations in electrical parameters such as
device threshold and parasitic capacitances
...
Keeping the orientation the same across the chip for
the clock drivers is critical
...
Spatial variation usually consists of waferlevel (or within-wafer) variation and die-level (or within-die) variation
...
The random variations
however, ultimately limits the matching and skew that can be achieved
...
fm Page 55 Tuesday, April 16, 2002 9:12 AM
55
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
Interconnect Variations (3)
Vertical and lateral dimension variations cause the interconnect capacitance and resistance
to vary across a chip
...
One important source of interconnect variation is the Inter-level Dielectric (ILD) thickness
variations
...
The oxide layer is deposited over a layer
of patterned metal features, generally resulting in some remaining step height or surface
topography
...
15a)
...
This is
primarily caused due to variations in polish rate that is a function of the circuit layout density and pattern effects
...
15b shows this effect where the polish rate is higher for
the lower spatial density region, resulting in smaller dielectric thickness and higher capacitance
...
Significant advances have been made to develop
analytical models for estimating the ILD thickness variations based on spatial density
...
g
...
Figure 10
...
The graphs show that there
is clear correlation between the density and the thickness of the dielectric
...
Other interconnect variations include deviation in the width of the wires and line
spacing
...
At the lower levels of
metallization, lithographic effects are important while at higher levels etch effects are
important that depend on width and layout
...
15 Inter-level Dielectric (ILD) thickness variation due to density (coutersy of
Duane Boning)
...
fm Page 56 Tuesday, April 16, 2002 9:12 AM
Section 10
...
A detailed review of device and
interconnect variations is presented in [Boning00]
...
16 Pattern density and ILD thickness variation for a high performance microprocessor
...
The two major sources of environmental variations are temperature and
power supply
...
This has particularly become an issue with clock gating where
some parts of the chip maybe idle while other parts of the chip might be fully active
...
Since the device parameters (such as threshold,
mobility, etc
...
More importantly, this component is time-varying since the temperature changes as the logic activity of the circuit
varies
...
An
interesting question is does temperature variation contribute to skew or to jitter? Clearly
the variation in temperature is time varying but the changes are relatively slow (typical
time constants for temperature on the order of milliseconds)
...
Fortunately, using feedback, it is possible to calibrate the temperature and compensate
...
The delay through buffers is a very strong function of power supply as it
directly affects the drive of the transistors
...
Therefore, the buffer delay along one path is
very different than the buffer delay along another path
...
Static power supply variations may result from fixed currents drawn from various modules, while high-frequency
variations result from instantaneous IR drops along the power grid due to fluctuations in
switching activity
...
This has particularly become a concern with clock gating
as the load current can vary dramatically as the logic transitions back and forth between
the idle and active states
...
fm Page 57 Tuesday, April 16, 2002 9:12 AM
57
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
clock signal is modulated on a cycle-by-cycle basis, resulting in jitter
...
Unfortunately, high-frequency
power supply changes are difficult to compensate even with feedback techniques
...
Capacitive Coupling (✖and ✗ )
The variation in capacitive load also contributes to timing uncertainty
...
The clock network includes both the interconnect and the gate capacitance of latches and registers
...
Since the adjacent signal can transition in arbitrary directions and at arbitrary times, the exactly coupling to the clock network
is not fixed from cycle-to-cycle
...
Another major source of clock
uncertainty is variation in the gate capacitance related to the sequential elements
...
In many latches and
registers this translates to the clock load being a function of the current state of the
latch/register (this is, the values stored on the internal nodes of the circuit), as well as the
next state
...
Example 10
...
17, where a minimum-sized local clock buffer drives
a register (actually, four registers are driven, though only one is shown here)
...
The jitter on the clock based on data-dependent capacitance is illustrated
...
CK
T2
CKb
CK
CKb
CLK
2
...
5
CKb
CLK
0
...
5
0
0
...
2
time (ns)
Figure 10
...
0
...
4
chapter10_141
...
3
10
...
3
Synchronous Design — An In-depth Perspective
58
Clock-Distribution Techniques
It is clear from the previous discussion that clock skew and jitter are major issues in digital
circuits, and can fundamentally limit the performance of a digital system
...
Another important consideration in
clock distribution is the power dissipation
...
To reduce power dissipation, clock networks must support clock conditioning — this is, the ability to shut down parts of the
clock network
...
In this section, an overview of basic constructs in high-performance clock distribution techniques is presented along with a case study of clock distribution in the Alpha
microprocessor
...
Fabrics for clocking
Clock networks typically include a network that is used to distribute a global reference to
various parts of the chip, and a final stage that is responsible for local distribution of the
clock while considering the local load variations
...
Therefore
one common approach to distributing a clock is to use balanced paths (or called trees)
...
18, where a 4x4 array is
shown
...
Ideally, if each path is balanced, the clock skew is zero
...
However, in
reality, as discussed in the previous section, process and environmental variations cause
clock skew and jitter to occur
...
18 Example of an H-tree clock-distribution
network for 16 leaf nodes
...
The concept can be generalized to a more generic set-
chapter10_141
...
19 An example RC-matched distribution for an
IBM Microprocessor [Restle98]
...
The more general approach, referred to as routed RC trees, represents a floorplan that
distributes the clock signal so that the interconnections carrying the clock signals to the
functional sub-blocks are of equal length
...
An example of a matched RC is shown in Figure 10
...
The
chip is partitioned into ten balanced load segments (tiles)
...
A lower level RC-matched
tree is used to drive 580 additional drivers inside each tile
...
20 [Bailey00]
...
This approach is fundamentally different
from the balanced RC approach
...
Rather, the absolute delay is minimized assuming that the grid
size is small
...
Unfortunately, the
penalty is the power dissipation since the structure has a lot of unnecessary interconnect
...
Clock
distribution is often only considered in the last phases of the design process, when most of
the chip layout is already frozen
...
20 Grid structures allow a low skew
distribution and physical design flexibility at the
cost of power dissipation [Bailey00]
...
fm Page 60 Tuesday, April 16, 2002 9:12 AM
Section 10
...
With
careful planning, a designer can avoid many of these problems, and clock distribution
becomes a manageable operation
...
These processors have always been at the cutting edge of the
technology, and therefore represent an interesting perspective on the evolution of clock
distribution
...
The first generation Alpha microprocessor (21064 or EV4)
from Digital Equipment Corporation used a single global clock driver [Dobberpuhl92]
...
20, resulting in a total clock load of 3
...
The inputs to the clock drivers are shorted out to smooth out the asymmetry in
the incoming signals
...
The clock driver and the associated pre-drivers account for 40% of the effective
switched capacitance (12
...
The overall
width of the clock driver was on the order of 35cm in a 0
...
A detail clock
skew simulation with process variations indicates that a clock uncertainty of less than
200psec (< 10%) was achieved
...
21 Clock load for the
...
The Alpha 21164 microprocessor (EV5) operates at a
clock frequency of 300 Mhz while using 9
...
5 mm × 18
...
5 µm CMOS technology [Bowhill95]
...
75 nF
...
The incoming clock signal is first routed through a single six-stage buffer placed at
the center of the chip
...
22a)
...
The equivalent transistor width of the final driver
inverter equals 58 cm! To ensure the integrity of the clock grid across the chip, the grid
was extracted from the layout, and the resulting RC-network was simulated
...
22b
...
fm Page 61 Tuesday, April 16, 2002 9:12 AM
61
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
Clock driver
(a) Chip microphotograph, showing positioning of clock drivers
...
Figure 10
...
dent from the plot, the skew is zero at the output of the left and right drivers
...
The critical instruction and
execution units all see the clock within 65 psec
...
The clock skew problems were eliminated by either routing the clock in the opposite direction of the data at a small expense of performance or by ensuring that the data could not
overtake the clock
...
To avoid race-through conditions, a number of design guidelines were followed including:
• Careful sizing of the local clock buffers so that their skew was minimal
...
This gate, which
can be part of the logic function or just a simple inverter, ensures that the signal cannot overtake the clock
...
To improve the inter-layer dielectric uniformity, filler
polygons were inserted between widely spaced lines (Figure
10
...
Though this may increase the capacitance to nearby
signal lines, the improved uniformity results in lower variation
and clock uncertainty
...
This
technique is used in many processors for controlling the clock
skew
...
23 Dummy
fills reduce the ILD
variation and improve
clock skew
...
fm Page 62 Tuesday, April 16, 2002 9:12 AM
Section 10
...
However, making such a circuit work in a reliable way
requires careful planning and intensive analysis
...
A hierarchical clocking scheme is used in the 600Mhz
Alpha 21264 (EV6) processor (in 0
...
24
...
Using a hierarchical clocking approach makes trade-off’s between power and skew
management
...
As seen in previous generation microprocessors, the clock power contributes to a
large fraction of overall power consumption
...
The
drawback of using a hierarchical clock network is that skew reduction becomes more difficult because clocks to various local registers may go through very different paths, which
may contribute to the skew
...
Figure 10
...
The clock hierarchy consists of a global clock grid, called GCLK, that covers the
entire die
...
The onchip generated clock is routed to the center of the die and distributed using tree structures
to 16 distributed clock drivers (Figure 10
...
The global clock distribution network utilizes a windowpane configuration, that achieves low skew by dividing up the clock into 4
regions -- this reduces the distance from the drivers to the loads
...
This also helps the power
supply and thermal problems as the drivers are distributed through the chip
...
The drawback clearly is the increased capacitance
of the Global Clock grid when compared to a tree distribution approach
...
The major clock grid are used to drive different large execution blocks within the chip including 1) Bus interface unit 2) integer issue and execution
units 3) floating point issue and execution units 4) instruction fetch and branch prediction
unit 5) load/store unit and 6) pad ring
...
chapter10_141
...
25 Global clock-distribution network in a window-pane structure
...
The local clocks provide great flexibility in the design of the local logic
blocks, but at the same time, makes it significantly more difficult to manage skew
...
Furthermore, the local clocks are susceptible to coupling from data lines as well because
they are local and not shielded like the global grid-ded clocks
...
Design Techniques—Dealing with Clock Skew and Jitter
To fully exploit the improved performance of logic gates with technology scaling, clock
skew and jitter must be carefully addressed
...
Some guidelines for reducing of clock skew and jitter are
presented below
...
To minimize skew, balance clock paths from a central distribution source to individual clocking elements using H-tree structures or more generally routed tree structures
...
2
...
3
...
The use of gated clocks to
save also results in data dependent clock load and increased jitter
...
g
...
4
...
This elminates races at the cost of perfomance
...
fm Page 64 Tuesday, April 16, 2002 9:12 AM
Section 10
...
Avoid data dependent noise by shielding clock wires from adjacent signal wires
...
6
...
Dummy fills are very
common and reduce skew by increasing uniformity
...
7
...
The use of feedback circuits based on delay locked loops as discussed later in this
chapter can easily compensate for temperature variations
...
Power supply variation is a significant component of jitter as it impacts the cycle to
cycle delay through clock buffers
...
Unfortunately, decoupling
capacitors require a significant amount of area and efficient packaging solutions
must be leveraged to reduce chip area
...
3
...
In an edge-triggered system, the worst
case logic path between two registers determines the minimum clock period for the entire
system
...
The use of a latch based methodology (as illustrated in Figure 10
...
This flexibility, allows an overall performance increase
...
For the latch-based system in Figure 10
...
Assume furthermore that the clock are ideal, and that the two clocks are inverted versions
of each other (for sake of simplicity)
...
On the falling edge of CLK2 (at edge ), the output CLB_A is latched and the
computation of CLK_B is launched
...
This timing appears equivalent to having an edge-triggered system where CLB_A and CLB_B are cascaded and
between two edge-triggered registers (Figure 10
...
In both cases, it appears that the time
available to perform the combination of CLB_A and CLB_B is TCLK
...
fm Page 65 Tuesday, April 16, 2002 9:12 AM
65
TIMING ISSUES IN DIGITAL CIRCUITS
In
L1
D
A
L2
CLB_A
tpd,A
Q
CLK1
B
Q
D
L1
CLB_B
tpd,B
D
CLK2
C
Chapter 10
L2
CLB_C
tpd,C
Q
D
CLK1
Q
CLK2
TCLK
2
launch A
3
compute
CLB_A
1
CLK2
launch B
4
CLK1
launch C
compute
CLB_B
Figure 10
...
However, there is an important performance related difference
...
This approach requires no explicit design changes, as the passing of slack from one block to the next is automatic
...
Stated in another way, if the sequential system works
at a particular clock rate and the total logic delay for a complete cycle is larger than the
clock period, then unused time or slack has been implicitly borrowed from preceding
stages
...
In Figure 10
...
What happens
if the combinational logic block of the previous stage finishes early and has a valid input
data for CLB_A before edge ? Since the a latch is transparent during the entire high
phase of the clock, as soon as the previous stage has finished computing, the input data for
CLB_A is valid
...
e
...
Formally state, slack passing has taken place if TCLK < tpd, A + tpd, B and the logic functions correctly (for simplicity, the delay associated with latches are ignored)
...
2
2
In
L1
D
L1
L2
Q
CLK1
D
Q
CLB_A
CLK2
CLB_B
D
L2
Q
CLK1
D
Q
CLB_C
CLK2
Figure 10
...
26
...
28
...
This implies that the previous block did not use up the
entire high phase of CLK1, which results in slack time (denoted by the shaded area)
...
fm Page 66 Tuesday, April 16, 2002 9:12 AM
Section 10
...
Since L2 is a transparent latch,
c becomes valid on the high phase of CLK2 and CLB_B starts to compute by using the
slack provided by CLB_A
...
As this picture indicates, the total cycle delay, that is
the sum of the delay for CLB_A and CLB_B, is larger than the clock period
...
3
4
L1
In
D
Q
CLB_A
tpd,A
a
b
L2
D Q
CLB_B
tpd,B
c
L1
d
CLK2
CLK1
Q
D
e
CLK1
TCLK
4
2
3
1
CLK1
CLK2
slack passed to next stage
tpd,A
tDQ
tpd,B
tDQ
e valid
d valid
a valid
b valid c valid
Figure 10
...
An important question related to slack passing relates to the maximum possible
slack that can be passed across cycle boundaries
...
28, it is easy that see that
the earliest time that CLB_A can start computing is
...
Therefore, the maximum time that can be borrowed from
the previous stage is 1/2 cycle or TCLK /2
...
This implies that the maximum logic cycle delay is equal to 1
...
However, note that for an n-stage pipeline, the overall logic delay cannot exceed the
time available of n * TCLK
...
3 Slack-passing example
First consider an edge-triggered pipeline of Figure 10
...
Assume that the primary input
In is valid slightly after the rising edge of the clock
...
The latency is two clock cycles (actually, the
output is valid 2
...
Note that for the first pipeline stage, 1/2
chapter10_141
...
This time can be exploited in a latch based system
...
29 Conventional edge-triggered pipeline
...
30 shows a latch based pipeline of the same sequential circuit
...
This is
enabled by slack borrowing between logical partitions
...
30 Latch-based pipeline
...
4
Self-Timed Circuit Design*
10
...
1
Self-Timed Logic - An Asynchronous Technique
The synchronous design approach advocated in the previous sections assumes that all circuit events are orchestrated by a central clock
...
• They insure that the physical timing constraints are met
...
This ensures that only legal logical values are applied in the next round of
computation
...
• Clock events serve as a logical ordering mechanism for the global system events
...
On every
chapter10_141
...
4
Self-Timed Circuit Design*
68
clock transition, a number of operations are initiated that change the state of the
sequential network
...
31
...
The important point to note under
this methodology is that the clock period is chosen to be larger than the worst-case delay
of each pipeline stage, or T > max (tpd1, tpd2, tpd3) + tpd,reg
...
At each clock transition, a new set of inputs is sampled and computation is started anew
...
When to sample a new
input or when an output is available depends upon the logical ordering of the system
events and is clearly orchestrated by the clock in this example
...
31
Logic
Block #1
Q
tpd,reg
R2
D
Q
tpd1
Logic
Block #2
tpd2
R3
D
Q
Logic
Block #3
R4
D
Q
tpd3
Pipelined, synchronous datapath
...
It presents a
structured, deterministic approach to the problem of choreographing the myriad of events
that take place in digital designs
...
The approach is robust and easy to
adhere to, which explains its enormous popularity; however it does have some pitfalls
...
This is not the case in reality, because of effects such as clock skew
and jitter
...
This causes significant noise problems due to package inductance and power supply grid resistance
...
For instance, the throughput rate of the pipelined system of Figure 10
...
On
the average, the delay of each pipeline stage is smaller
...
For example, the propagation delay of a 16-bit adder is highly data dependent
(e
...
, adding two 4-bit numbers requires a much shorter time compared to adding
two 16-bit numbers)
...
Designing a purely asynchronous circuit is a nontrivial and
potentially hazardous task
...
In fact, the logical ordering of the events is dictated by the
structure of the transistor network and the relative delays of the signals
...
fm Page 69 Tuesday, April 16, 2002 9:12 AM
69
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
constraints by manipulating the logic structure and the lengths of the signal paths requires
an extensive use of CAD tools, and is only recommended when strictly necessary
...
Figure 10
...
This approach assumes that each combinational
function has a means of indicating that it has completed a computation for a particular
piece of data
...
The
combinational logic block computes on the input data and in a data-dependent fashion
(taking the physical constraints into account) generates a Done flag once the computation
is finished
...
This signaling ensures the logical ordering of the events and can be
achieved with the aid of an extra Ack(nowledge) and Req(uest) signal
...
Req
Req
HS
Ack
R1
HS
Ack
Start
In
Req
Done
F1
tpF1
HS
Ack
Start
R2
Req
Done
F2
tpF2
Ack
Start
R3
Done
F3
Out
tpF3
Figure 10
...
1
...
If F1 is inactive at
that time, it transfers the data and acknowledges this fact to the input buffer, which
can go ahead and fetch the next word
...
F1 is enabled by raising the Start signal
...
3
...
If this function is free, an Ack(nowledge) is
raised, the output value is transferred, and F1 can go ahead with its next computation
...
The completion signal Done ensures that the physical
timing constraints are met and that the circuit is in steady state before accepting a new
input
...
Both interested parties synchronize with
each other by mutual agreement or, if you want, by shaking hands
...
The choice of protocol is important, since it has a profound effect on the circuit performance and robustness
...
fm Page 70 Tuesday, April 16, 2002 9:12 AM
Section 10
...
• In contrast to the global centralized approach of the synchronous methodology, timing signals are generated locally
...
• Separating the physical and logical ordering mechanisms results in a potential
increase in performance
...
In selftimed systems, a completed data-word does not have to wait for the arrival of the
next clock edge to proceed to the subsequent processing stages
...
For a ripple carry adder, the average length of carry-propagation is O(log
(N))
...
• The automatic shut-down of blocks that are not in use can result in power savings
...
As discussed earlier, this overhead can be
substantial
...
• Self-timed circuits are by nature robust to variations in manufacturing and operating
conditions such as temperature
...
The performance of a self-timed
system is determined by the actual operating conditions
...
32)
...
10
...
2
Completion-Signal Generation
A necessary component of self-timed logic is the circuitry to indicate when a particular
piece of circuitry has completed its operation for the current piece of data
...
Dual-Rail Coding
One common approach to completion signal generation is the use of dual rail coding
...
Consider the redundant data
chapter10_141
...
1
...
For the data to be valid or the computation to be completed, the circuit must be in a
legal 0 (B0 = 0, B1 = 1) or 1 (B0 = 1, B1 = 0) state
...
The (B0 = 1,
B1 = 1) state is illegal and should never occur in an actual circuit
...
1
Redundant signal representation to include transition state
...
33, which is a dynamic version of the DCVSL logic style where the clock is
replaced by the Start signal [Heller84]
...
When the Start signal is low, the circuit is precharged by the PMOS transistors, and the output (B0, B1) goes in the Reset-Transition
state (0, 0)
...
Either
B0 or B1—but never both—goes high, which raises Done and signals the completion of
the computation
...
33 Generation of a
completion signal in DCVSL
...
The completion generation is performed in series with the logic evaluation,
and its delay adds directly to the total delay of the logic block
...
Completion generation thus comes at the expense of both area and speed
...
Redundant signal presentations
other than the one presented in Table 10
...
One essential element is
the presence of a transition state denoting that the circuit is in evaluation mode and the
output data is not valid
...
fm Page 72 Tuesday, April 16, 2002 9:12 AM
Section 10
...
4
72
Self-Timed Adder Circuit
An efficient implementation of a self-timed adder circuit is shown in Figure 10
...
A Manchester-carry scheme is used to boost the circuit performance
...
It is consequentially sufficient to use the differential signaling in
the carry path only (Figure 10
...
The completion signal is efficiently derived by combining the carry signals of the different stages (Figure 10
...
This safely assumes that the sum
generation, which depends upon the arrival of the carry signal, is faster than the completion
generation
...
All other signals such as P(ropagate), G(enerate), K(ill), and S(um) do not require completion generation
and can be implemented in single-ended logic
...
The only difference is that the G(enerate) signal is
replaced by a K(ill)
...
VDD
Start
P0
C0
P1
P2
C0
G0
G1
G2
C2out
C1out
Start
P3
C3
C2
C1
C4
C4
G3
C3out
C4out
VDD
Start
VDD
C4out
C0
P1
C1
C0
K0
C2
K1
C1out
Start
P2
C3
K2
C2out
C4
C4
K3
C3out
C2out
C2out
C1out
C4out
(a) Differential carry generation
Figure 10
...
Replica Delay
While the dual-rail coding above allows tracking of the signal statistics, it comes at the
cost of power dissipation
...
An attempt to reduce the overhead of completion
detection is to use a critical-path replica configured as a delay element, as shown in Figure
chapter10_141
...
35
...
At the same time, the start signal is fed into the replica delay line,
which tracks the critical path of the logic network
...
When the output of the delay line makes a
transition, it indicates that the logic is complete—as the delay line mimics the critical path
...
The advantage of this approach is that the logic can be implemented using a standard non-redundant circuit style such as complementary CMOS
...
Note that this approach generates the completion signal after a time equal to the worst case delay through the network
...
However, it can track the local
effects of process variations and environmental variations (e
...
, temperature or power supply variations)
...
In
LOGIC
NETWORK
Start
DELAY MODULE
(CRITICAL PATH REPLICA)
Out
Done
Figure 10
...
Example 10
...
g
...
Figure 10
...
A current sensor is inserted in
series with the combinational logic, and monitors the current flowing through the logic
...
This signal effectively determines when the logic has completed its cycle
...
If the input data vector does not change from one cycle to the next, no current is
drawn from the supply for static CMOS logic
...
The
outputs of the current sensor and minimum delay element are then combined
...
Ensuring reliability while
keeping the overhead circuitry small is the main challenge
...
chapter10_141
...
4
Start
tdelay
Output
Static CMOS Logic
A
GNDsense
Start
Current Sensor
toverlap
A
B
tMDG
Done
Min Delay Generator
tpd-NOR
Done
B
Output
valid
Figure 10
...
4
...
⇒
tMDG > min(tdelay)
toverlap > 0
Self-Timed Signaling
Besides the generation of the completion signals, a self-timed approach also requires a
handshaking protocol to logically order the circuit events avoiding races and hazards
...
37, which shows a sender module transmitting data to a receiver ([Sutherland89])
...
37
Receiver’s action
Two-phase handshaking protocol
...
In some cases the request event is
a rising transition; at other times it is a falling one—the protocol described here does not
distinguish between them
...
If the receiver is busy or its input buffer is full, no Ack event is generated, and
the transmitter is stalled until the receiver becomes available by, for instance, freeing
space in the input buffer
...
The four events, data change, request, data acceptance,
2
1
3
1
chapter10_141
...
38
Muller C-element
...
Successive cycles may take different amounts
of time depending upon the time it takes to produce or consume the data
...
Both phases are terminated by certain events
...
The
sender is free to change the data during its active cycle
...
The receiver can only accept
data during its active cycle
...
37
...
An
essential component of virtually any handshaking module is the Muller C-element
...
38, performs an
AND-operation on events
...
When the inputs differ, the output retains its previous value
...
As long as this does not happen, the output
remains unchanged and no output event is generated
...
Figure 10
...
VDD
VDD
A
A
S
B
R
Q
B
A
F B
B
Figure 10
...
Figure 10
...
Assume that Req, Ack, and Data
Ready are initially 0
...
Req goes
chapter10_141
...
4
Self-Timed Circuit Design*
76
high—this is commonly denoted as Req↑
...
The C-element is blocked, and no new data is sent to
the data bus (Req stays high) as long as the transmitted data is not processed by the
receiver
...
This can be the result
of many different actions, possibly involving other C-elements communicating with
subsequent blocks
...
A Data ready↓ event, which might already have happened
before Ack↑, produces a Req↓, and the cycle is repeated
...
40 A Muller C-element
implements a two-phase handshake
protocol
...
Ack
Handshake logic
Problem 10
...
41 shows a two-phase, self-timed implementation of a FIFO (first-in first-out)
buffer with three registers
...
How can you observe that the FIFO is completely empty (full)? (Hint: Determine the necessary conditions on the Ack and Req signals
...
41 Three-stage
self-timed FIFO, using a twophase signaling protocol
...
There is some
bad news, however
...
Most logic devices in the MOS technology tend to be sensitive to levels or
to transitions in one particular direction
...
Since the transition direction is important, initializing all the
Muller C-elements in the appropriate state is essential
...
fm Page 77 Tuesday, April 16, 2002 9:12 AM
77
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
happen
...
The only alternative is to adopt a different signaling approach, called four-phase signaling, or return-to-zero (RTZ)
...
Once again,
this is illustrated with the example of the sender-receiver
...
42
...
42 Four-phase
handshaking protocol
...
Both the Req and
the Ack are initially in the zero-state, however
...
When ready, the
receiver accepts the data and raises Ack (Ack↑ or )
...
The protocol
proceeds, now by bringing both Req (Req↓ or ) and Ack (Ack↓ or ) back to their initial
state in sequence
...
This protocol is called four-phase because four distinct time-zones can be recognized per cycle: two for the sender; two for the receiver
...
An
implementation of the protocol, based on Muller C-elements, is shown in Figure 10
...
It
is interesting to notice that the four-phase protocol requires two C-elements in series
(since four states must be represented)
...
1
3
5
2
4
1
Data
Sender
Receiver
logic
logic
Data accepted
Data ready
S
C
Req
C
Ack
Handshake logic
Figure 10
...
chapter10_141
...
4
Self-Timed Circuit Design*
Problem 10
...
43
...
The four-phase protocol has the disadvantage of being more complex and slower,
since two events on Req and Ack are needed per transmission
...
The logic in the sender and receiver modules does not have to
deal with transitions, which can go either way, but only has to consider rising (or falling)
transition events or signal levels
...
For this reason, four-phase handshakes are the preferred implementation approach for most
of the current self-timed circuits
...
Example 10
...
Now it is time to bring them all together
...
A view of the self-timed data path, including the
timing control, is offered in Figure 10
...
The logic functions F1 and F2 are implemented
using dual-rail, differential logic
...
44
Done
C
Acko
Self-timed pipelined datapath—complete composition
...
All Start
signals are low so that all logic circuits are in precharge condition
...
The enable signal En of R1 is raised, effectively latching the input
data into the register, assuming a positive edge-triggered or a level-sensitive implementation
...
The second C-element is triggered as well,
since Ackint is low
...
At its completion, the output data is placed on the bus, and a request is initiated to the second stage
(Reqint↑), which acknowledges its acceptance by raising Ackint
...
However, the input
buffer can respond to the Acki↑ event by resetting Reqi to its zero state (Reqi↓)
...
Upon receival of Ackint↑, Start goes low, the pre-charge phase starts, and
chapter10_141
...
Note that this sequence corresponds to the four-phase handshake
mechanism described earlier
...
45
...
Computer tools are often used to derive STGs that ensure proper
operation and optimize the performance
...
45 State transition diagram for pipeline stage 1
...
Arrows in dashed lines express actions in either the preceding or
following stage
...
4
...
Unfortunately, the overhead circuits preclude widespread application in
general purpose digital computing
...
A few examples that illustrate the use of self-timed
concepts for either power savings or performance enhancement are presented below
...
Imbalances in a logic network cause inputs of a logic gate or block to arrive at different times,
resulting in glitching transitions
...
If a logic
block can be enabled after all of the inputs settle, then the number of glitching transitions
can be reduced
...
Tri-state buffers are inserted between each of these
phases to prevent glitches from propagating further in the datapath (Figure 10
...
Assuming an arbitrary logic network in Figure 10
...
When the tri-state buffers at the output of logic block 1 are enabled, the the
computation of logic block 2 is allowed to proceed
...
The control of the tri-state buffer can be performed through the use of a selftimed enable signal which is generated by passing the system clock through a delay chain
chapter10_141
...
4
Self-Timed Circuit Design*
80
that models the critical path of the processor
...
This technique succeeds in reducing the switched capacitance combinational logic blocks such as multipliers, even including the overhead of the delay chain,
gating signal distribution and buffers [Goodman98]
...
in2
Logic
out2
in3
Block 2
Block 1
CLKD1
Logic
Block 3
CLKD2
...
46 Application of self-timing for glitch reduction
...
This structure uses a different control structure than the conventional self-timed pipeline described earlier
...
Since the precharge happens after operation instead of before evaluation, it is often termed post-charge logic
...
47 [Williams00]
...
47 Self-resetting logic
...
It is possible to precharge a
block based on the completion of its own output, but care must be taken to ensure that the
following stage has properly evaluated based on the output before it is precharged
...
fm Page 81 Tuesday, April 16, 2002 9:12 AM
81
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
It should be noted that unlike other logic
VDD
styles, the signals are represented as pulses, and are
valid only for a limited duration
...
While this logic
style offers potential speed advantages, special care
Figure 10
...
Also, cir3-input OR
...
An example of self-resetting logic is
shown in Figure 10
...
Assume that all inputs are low, and int is initially
precharged
...
This causes, the gate to
precharge
...
Clock-Delayed Domino
One interesting application of self-timed circuits using the delay-matching concept is
Clock-Delayed Domino
...
Instead, the clock for one stage is derived from the previous stage
...
49
...
e
...
Sometimes, there is
a transmission gate inserted between the two inverters to accomplish this (the transmission
gate is on with the gate terminal of NMOS tied to VDD and the PMOS to GND)
...
There are several advantages of using such a self-clocked timing
abstraction
...
This
alleviates a major limitation of conventional Domino that is only capable of non-inverting
logic
...
Also, notice that is possible to eliminate the “foot-switch” in the later stages, as the clock-evaluation edge arrives only when
the input is stable
...
A careful
analysis of the timing shows that the short circuit power can be eliminated
...
49 Clock Delayed Domino Logic
...
fm Page 82 Tuesday, April 16, 2002 9:12 AM
Section 10
...
5
Synchronizers and Arbiters*
82
Synchronizers and Arbiters*
10
...
1
Synchronizers—Concept and Implementation
Even though a complete system may be designed in a synchronous fashion, it must still
communicate with the outside world, which is generally asynchronous
...
50
...
50 Asynchronous-synchronous interface
Consider a typical personal computer
...
This reference determines
what happens within the computer system at any point in time
...
The way a synchronous system deals with such an asynchronous signal is to sample
or poll it at regular intervals and to check its value
...
However, it might happen that the signal is polled in the middle of a transition
...
At that point, it is not clear
if the key was pressed or not
...
For instance, one function might decide that the key is
pushed and start a certain action, while another function might lean the other way and
issue a competing command
...
Therefore, the
undefined state must be resolved in one way or another before it is interpreted further
...
For
instance, it is either decided that the key is not yet pressed, which will be corrected in the
next poll of the keyboard, or it is concluded that the key is already pressed
...
A circuit that implements such a decision-making function is called a synchronizer
...
An asynchronous/synchronous interface is thus always prone
to errors called synchronization failures
...
Typically, this probability can be reduced in an exponential fashion by waiting longer before
chapter10_141
...
This is not too troublesome in the keyboard example, but in general,
waiting affects system performance and should therefore be avoided to a maximal extent
...
51
...
However, since the samCLK
pled signal is not synchronized to the clock sig- Figure 10
...
nal, there is a finite probability that the set-up
time or hold time of the latch is violated (the probability is a strong function of the transition frequencies of the input and the clock)
...
The sampled signal eventually evolves into a legal 0 or 1 even in the latter case, as the
latch has only two stable states
...
7
Flip-Flop Trajectories
Figure 10
...
The inverters are composed of minimum-size devices
...
0
Vout
Figure 10
...
1
...
0
0
100
200
300
time, ps
If the input is sampled such that cross-coupled inverter starts at the metastable point,
the voltage will remain at the metastable state forever in the absense of noise
...
The time it takes to reach the acceptable signal zones
depends upon the initial distance of the sampled signal from the metastable point
...
This model is used to compute
the range of values for v(0) that still cause an error, or a voltage in the undefined range,
after a waiting period T
...
chapter10_141
...
5
Synchronizers and Arbiters*
84
V MS – ( V MS – V IL )e – T / τ ≤ v ( 0 ) ≤ V MS + ( V IH – V MS )e – T / τ
(10
...
(10
...
Increasing the
waiting period from 2τ to 4τ decreases the interval and the chances of an error by a factor
of 7
...
Some information about the asynchronous signal is required in order to compute the
probability of an error
...
Assume also that the
slopes of the waveform in the undefined region can be approximated by a linear function
(Figure 10
...
Using this model, we can estimate the probability Pinit that v(0), the value
of Vin at the sampling time, resides in the undefined region
...
9)
The chances for a synchronization error to occur depend upon the frequency of the
synchronizing clock φ
...
This means that the average number of synchronization errors per second Nsync(0)
equals Eq
...
10) if no synchronizer is used
...
10)
where Tφ is the sampling period
...
(10
...
P i ni t e –T / τ
( V IH – V IL )e – T / τ
tr
N sync ( T ) = --------------------- = -------------------------------------- -------------------Tφ
V swing
Tsignal T φ
(10
...
Example 10
...
Tφ = 5 nsec, which corresponds to a 200 Mhz clock), T
= Tφ = 5 nsec, Tsignal = 50 nsec, tr =0
...
7)
...
53
signal slope
...
fm Page 85 Tuesday, April 16, 2002 9:12 AM
85
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
From the VTC of a typical CMOS inverter, it can be derived that VIH – VIL approximately
equals 0
...
5 V
...
(10
...
38x 10–9 errors/sec
...
If no synchronizer was used, the MTF would only have been 2
...
• The exponential relation in Eq
...
11) makes the failure rate extremely sensitive to the
value of τ
...
τ varies from chip to
chip and is a function of temperature as well
...
A worst-case design scenario is
definitely advocated here
...
A problem occurs when T exceeds the sampling
period Tφ
...
54
...
Notice that this arrangement requires the φ-pulse to be short enough to avoid race conditions
...
The increase in MTF comes at the expense of an increased latency
...
54
O2
Sync
Out
Sync
Cascading synchronizers reduces the main time-to-failure
...
Making
the mean time-to-failure very large does not preclude errors
...
A maximum of one or two per
system is advocated
...
5
...
An arbiter is an element that decides which of
two events has occurred first
...
A synchronizer is actually a
chapter10_141
...
6
Clock Synthesis and Synchronization Using a Phase-Locked Loop
86
special case of an arbiter, since it determines if a signal transition happened before or
after a clock event
...
An example of a mutual-exclusion circuit is shown in Figure 10
...
It operates on
two input-request signals, that operate on a four-phase signaling protocol; that is, the
Req(uest) signal has to go back to the reset state before a new Req(uest) can be issued
...
While
Requests may occur concurrently, only one of the Acknowledges is allowed to go high
...
An event on one of the
inputs (e
...
, Req1↑) causes the flip-flop to switch, node A goes low, and Ack1↑
...
The cross-coupled output structure keeps the output
values low until one of the NAND outputs differs from the other by more than a threshold
value VT
...
Req1
Ack1
Req1
Arbiter
Req2
A
Ack2
Ack1
B
Ack2
Req2
(a) Schematic symbol
Req1
(b) Implementation
Req2
VT gap
A
B
Metastable
Ack1
t
(c) Timing diagram
Figure 10
...
6
Mutual-exclusion element (or arbiter)
...
Synchronous circuits need a global periodic clock reference to drive sequential elements
...
Crystal oscillators generate accurate, low-jitter clocks
with a frequency range from 10’s of Megahertz to approximately 200MHz
...
A PLL takes an external low-frequency reference crystal frequency signal and
multiplies its frequency by a rational number N (see the left side of Figure 10
...
chapter10_141
...
Typically as shown in Figure 10
...
Since chip-to-chip communication occurs at a lower rate than the on-chip clock
rate, the reference clock is a divided but in-phase version of the system clock
...
Introducing clock buffers to deal with this problem unfortunately introduces skew between the data and sample clock
...
e
...
In addition, for
the configuration shown in Figure 10
...
Chip 1
Chip 2
Data
Digital
System
Digital
System
fsystem = N * fcrystal
Divider
reference
clock
PLL
PLL
Clock
Buffer
fcrystal < 200Mhz
Crystal
Oscillator
Figure 10
...
10
...
1
Basic Concept
Periodic signals of known frequency can be discribed exactly by only one parameter, their
phase
...
57
...
The relative phase is
defined as the difference between the two phases
...
57Relative and absolute phase of two periodic signals
chapter10_141
...
6
Clock Synthesis and Synchronization Using a Phase-Locked Loop
88
A PLL is a complex, nonlinear feedback circuit, and its basic operation is understood with the aid of Figure 10
...
The voltage-controlled oscillator (VCO)
takes an analog control input and generates a clock signal of the desired frequency
...
To synthesize a system clock of a particular frequency, it necessary to set the
control voltage to the appropriate value
...
e
...
The feedback loop is critical to tracking process and environmental variations
...
The reference clock is typically generated off-chip from an accurate crystal reference
...
e
...
The local clock and reference clock are compared using a phase detector that
compares the phase difference between the signals and produces an Up or Down signal
when the local clock lags or leads the reference signal
...
Next, the Up and Down
signals are fed into a charge pump, which translates the digital encoded control information into an analog voltage [Gardner80]
...
A Down signal, on the other hand, slows down the oscillator and eliminates
the phase lead of the local clock
...
The edge of the local clock jumps back and forth instantaneously and oscillates
around the targeted position
...
This is partially accomplished by the introduction of
the loop filter
...
Note that the PLL structure is a feedback structure and the addition of extra phase shifts,
as is done by a high-order filter, may result in instability
...
When in lock, the system clock is N-times the reference clock frequency
...
This is
especially true for the loop filter and VCO, where induced noise has a direct effect on the
resulting clock jitter
...
58
vcont
Composition of a phase-locked loop (PLL)
...
fm Page 89 Tuesday, April 16, 2002 9:12 AM
89
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
rails and the substrate
...
Analog circuits with a high supply rejection, such as
differential VCOs, are therefore desirable [Kim90]
...
Below a more detailed description of various components of a PLL is given
...
6
...
In other words a VCO is characterized by, ω = ω 0 + K vco ⋅ v cont
...
12)
v cont dt
where K vco is the gain of the VCO given in rad/s/V, and ω 0 is a fixed frequency offset
...
The output signal has the form
∫
Phase Detectors
t
–∞
x ( t ) = A ⋅ cos ω 0 t + K vco ⋅
v cont dt
(10
...
This output signal is then
used to adjust the output of the VCO and thus align the two inputs via a feedback network
...
Two basic types of phase
detectors are commonly used
...
XOR Phase Detector
...
The XOR
is useful as a phase detector since the time when the two inputs are different (or same) represents the relative phase
...
59 shows the XOR of two waveforms
...
fm Page 90 Tuesday, April 16, 2002 9:12 AM
Section 10
...
The output (low
pass filtered) as a function of the phase error is also shown
...
59
-90
90
180
phase error (deg)
The XOR as a phase detector
...
e
...
Thus the linear phase range is only 180 degrees
...
e
...
A drawback of the XOR phase detector is that it may
lock to a multiple of the clock frequency
...
The filtered version of this signal will be identical to
that of the truly locked state and thus the VCO will operate at the nominal frequency
...
The phase-frequency detector (PFD) is the most commonly
used form of phase detector, and it solves several of the shortcomings of the detectors discussed above
...
Accordingly, it cannot lock to an incorrect
chapter10_141
...
The PFD takes two clock inputs and produces two outputs, UP
and DOWN as shown in Figure 10
...
Rst
Q
D
UP
UP=0
DN=1
A
D
A
B
B
UP=0
DN=0
Rst
Q
A
DN
B
(a) schematic
UP = 1
DN = 0
A
B
(b) state transition diagram
A
A
B
B
UP
UP
DN
DN
(c) Timing waveforms
Figure 10
...
(a) Schematic (b) State Transition Diagram (c) Timing
...
Assume that both UP and DN outputs are
initially low
...
The UP signal remain in this state until a low-to-high transition occurs on input B
...
Notice that a short pulse proportional to the phase error is generated on
the DN signal, and that there is a small pulse on the DN output, whose duration is is equal
to the delay throught he AND gate and register reset delay
...
The roles are reversed for the case
when input B lags A, and a pulse proportional to the phase error is generated on the DN
output
...
A
B
UP
Figure 10
...
DN
The circuit also acts as a frequency detector, providing a measure of the frequency
error (Figure 10
...
For the case when A is at a higher frequency than B, the PFD generates a lot more UP pulses (with the average proportional to the frequency difference),
chapter10_141
...
6
Clock Synthesis and Synchronization Using a Phase-Locked Loop
92
while the DN pulses average close to zero
...
The phase characterisctics of the phase detector is shown in Figure 10
...
Notice
that the linear range has been expanded to 4π
...
62 Phase detector characteristic
...
63 Charge Pump
...
One possible implementation is
shown in Figure 10
...
A pulse on the UP signal adds a
charge packet proportional to the size of the UP pulse, and
a pulse on the DN signal removes a charge packet proportional to the DN pulse
...
This effectively increases the frequency of the
VCO
...
The period of
the startup transient is strongly dependent on the bandwidth of the loop filter
...
64
shows a spice level simulation of a PLL (an ideal VCO is used to speed up the simulation)
implemented in 0
...
In this example a reference frequency of 100Mhz is chosen and the PLL multiplies this frequency by 8 to 800Mhz
...
Once the control
voltage reaches its final value, the output frequency (the clock to the digital system) settles
to its final value
...
As illustrated in this plot, the top graph show lock in
which fout = 8 * fref
...
chapter10_141
...
0
div
Control Voltage (V)
0
...
6
0
...
2
div
0
...
64 Spice simulation of a PLL
...
Summary
In a short span of time, phased-locked loops have become an essential component of
any high-performance digital design
...
Yet, experience has demonstrated that
this combination is perfectly feasible, and leads to new and better solutions
...
7
Future Directions
This section highlights some of the trends in high-performance and low-power timing
optimization
...
7
...
A schematic of a DLL is shown in Figure 10
...
The key component of a DLL is a voltage-controlled delay line
(VCDL)
...
The idea is to delay the output clock such that it perfectly lines up with
the reference
...
The reference frequency is fed
into the input of the VCDL
...
Note that only a phase detection is required instead of a PFD
...
fm Page 94 Tuesday, April 16, 2002 9:12 AM
Section 10
...
The function of the feedback is to adjust the delay through
the VCDL such that the rising edge of the input reference clock (fREF), and the output
clock fO are aligned
...
65 Delay-Locked Loop
...
66c
...
Since the first edge of the output arrives before the reference edge,
an UP pulse of width equal to the error between the two signals
...
This causes the edge of the output signal to be delayed in the next
cycle (this implementation of the VCDL assumes that a large voltage results in larger
delay)
...
Figure 10
...
The
chip is partitioned into many small regions (or tiles)
...
g
...
For purpose of simplicity, the Figure shows a two-tile chip, but
this is easily extended to many regions
...
In front of each buffer is a VCDL
...
Unfortunately,
the static and dynamic variations of the buffers cause the phase error between the buffered
clocks to be non-zero and time-varying
...
The feedback inside each tile
adjusts the control voltage of VCDL, such that the buffered output is locked in phase to
the global input clock
...
g
...
Such configurations have become
common in high performance digital microprocessors
...
fm Page 95 Tuesday, April 16, 2002 9:12 AM
95
TIMING ISSUES IN DIGITAL CIRCUITS
VCDL
Chapter 10
Digital
Circuit
CP/LF
Phase
Detector
GLOBAL CLK
VCDL
Digital
Circuit
CP/LF
Phase
Detector
Figure 10
...
10
...
2
Optical Clock Distribution
It is clear that there are some fundamental problems associated with electrical synchronization techniques for future high-performance multi-GHz systems
...
Even with aggressive active clock management schemes such as the use of DLLs and PLLs, the variations
in power supply and clock load result in unacceptable clock uncertainity
...
An excellent review of the rationale and trade-offs in optical interconnects vs
...
The potential advantage of optical technology for clock distribution is due to the fact
that the delay is not sensitive to temperature and the clock edges don’t degrade over long
distances (e
...
, 10’s meters)
...
Of course, the performance of an optical system
is limited by the speed of light
...
Figure 10
...
The off-chip optical source is brought to the chip, distributed through waveguides, and
converted through receiver circuitry to a local electrical clock distribution network
...
Notice that an H-treee is used
in distributing the optical clock
...
Once reaching the detector in each section,
the global clock optical pulses are converted into current pulses
...
fm Page 96 Tuesday, April 16, 2002 9:12 AM
Section 10
...
into voltage signals
...
Optics has the additional advantage that many of the
difficulties with electromagnetic wave phenomena are avoided (e
...
, crosstalk or inductive
coupling)
...
Optical clocks avoid this probelm and don’t require termination [Miller00]
...
There are some variations in the arrival time of the optical signal (e
...
, due to variations at bends cause different energy loss along different paths)
...
The sources of variations are very
similar to a conventional electrical approach including threshold and device variations,
power supply and temperature variations, and variations in the local drivers
...
67
distribution
...
However,
the challenges of dealing with process variations in the opto-electronic circuitry must be
addressed for this to become a reality
...
8
Perspective: Synchronous versus Asynchronous Design
The self-timed approach offers a potential solution to the growing clock-distribution problem
...
Independence from physical timing constraints is achieved with the aid of completion signals
...
This requires adherence to a certain protocol, which normally
consists of either two or four phases
...
Examples of self-timed circuits can be found in signal processing [Jacobs90], fast
arithmetic units (such as self-timed dividers) [Williams87], simple microprocessors
[Martin89] and memory (static RAM, FIFOs)
...
The design a fool-proof network of handshaking units, that is robust with respect
to races, live-lock, and dead-lock, is nontrivial and requires the availability of dedicated
design-automation tools
...
This was amply illustrated by the example of the 21164 Alpha microprocessor
...
fm Page 97 Tuesday, April 16, 2002 9:12 AM
97
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
agement requires extensive modeling and analysis, as well as careful design
...
This observation is
already reflected in the fact that the routing network for the latest generation of massively
parallel supercomputers is completely implemented using self-timing [Seitz92]
...
Other alternative timing approaches might emerge as well
...
10
...
• An in-depth analysis of the synchronous digital circuits and clocking approaches
was presented
...
Important parameters are the clocking scheme used and
the nature of the clock-generation and distribution network
...
Self-timed design uses completion signals
and handshaking logic to isolate physical timing constraints from event ordering
...
The introduction of synchronizers helps to reduce that risk,
but can never eliminate it
...
They are used to generate high speed clock signals on a chip
...
• Important trends for clock distribution include the use of delay-locked loops to
actively adjust delays on a chip
...
10
...
One of the best discussions so far is the chapter by Chuck Seitz in [Mead80, Chapter
7]
...
A collection of papers on clock distribution
networks is presented in [Friedman95]
...
chapter10_141
...
10
To Probe Further
98
REFERENCES
[Abnous93] A
...
Behzad, “A High-Performance Self-Timed CMOS Adder,” in EE241
Final Class Project Reports, by J
...
of California—Berkeley, May 1993
...
Bailey, “Clock Distribution,” in [Chandrakasan00]
...
Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley,
pp
...
[Boning00] D
...
Nassif, “Models of Process Variations in Device and Interconnect” in
[Chandrakasan00]
...
Bowhill et al
...
[Bernstein98] K
...
al, High Speed CMOS Design Styles, Kluwer Academic Publishers,
1998
...
Bernstein, “Basic Logic Families”, in [Chandrakasan00]
...
Chandrakasan, W
...
Fox, Design of High-Performance Microprocessor Circuits, IEEE Press, 2000
...
Chaney and F
...
on Computers, vol
...
421–422
...
Dally and J
...
[Dopperpuhl92] D
...
, “A 200 MHz 64-b Dual Issue CMOS Microprocessor,” IEEE
Journal on Solid State Circuits, vol
...
11, Nov
...
1555–1567
...
Friedman, ed
...
[Gardner80] F
...
on Communications,
vol
...
1849–1858
...
Glasser and D
...
360–365
...
Goodman, A
...
Dancy, A
...
Chandrakasan, "An Energy/Security Scalable
Encryption Processor Using an Embedded Variable Voltage DC/DC Converter," IEEE Journal of
Solid-state Circuits, pp
...
[Gray93] P
...
Meyer, Analysis and Design of Analog Integrated Circuits, 3rd ed
...
[Hatamian88] M
...
S
...
, Plenum Publishing, pp
...
[Heller84] L
...
, “Cascade Voltage Switch Logic: A Differential CMOS Logic Family,”
IEEE International Solid State Conference Digest, Feb
...
16–17
...
Jacobs and R
...
25, No 6, December 1990, pp
...
[Jeong87] D
...
, “Design of PLL-Based Clock Generation Circuits”, IEEE Journal on
Solid State Circuits, vol
...
[Johnson93] H
...
Graham, High-Speed Digital Design—A Handbook of Black Magic,
Prentice-Hall, N
...
[Kim90] B
...
Helman, and P
...
SC-25, no
...
1385–1394
...
fm Page 99 Tuesday, April 16, 2002 9:12 AM
99
TIMING ISSUES IN DIGITAL CIRCUITS
Chapter 10
[Martin89] A
...
al, “The First Asynchronous Microprocessor: Test Results,” Computer
Architecture News, vol
...
95–110
...
Mead and L
...
[Messerschmitt90] D
...
8, No
...
1404-1419, October 1990
...
Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips,”
Proceedings of the IEEE, pp
...
[Nielsen94] L Nielsen, C
...
Sparso, K
...
[Restle98] P
...
Jenkins, A
...
Cook, “Measurement and Modeling of On-chip
Transmission Line Effects in a 400MHz Microprocessor,” IEEE Journal of Solid-state Circuits,
pp
...
[Seitz80] C
...
218–262, 1980
...
Seitz, “Mosaic C: An Experimental Fine-Grain Multicomputer,” in Future Tendencies
in Computer Science, Control and Applied Mathematics, Proceedings International Conference
on the 25th Anniversary of INRIA, Springer-Verlag, Germany, pp
...
[Shoji88] M
...
[Sutherland89] I
...
720–738, June
1989
...
Veendrick, “The Behavior of Flip Flops Used as Synchronizers and Prediction of
Their Failure Rates”, IEEE Journal of Solid State Circuits, vol
...
169–176
...
Williams et al
...
of Advanced Research
in VLSI 1987, Stanford Conf
...
75–96, March 1987
...
eecs
...
edu/IcBook) for
insightful and challenging exercises and design problems
Title: VLSI
Description: this is a full book for vlsi will be very usefull author:RABEAY
Description: this is a full book for vlsi will be very usefull author:RABEAY