# **Timing Analysis with Clock Skew**

# David Harris, Mark Horowitz<sup>1</sup>, & Dean Liu<sup>1</sup> David\_Harris@hmc.edu, {horowitz, dliu}@vlsi.stanford.edu March, 1999

Harvey Mudd College

**Claremont, CA** 

<sup>1</sup> (with Stanford University, Stanford, CA)



# Outline

- Introduction
- Timing Analysis Formulation
- Timing Analysis with Clock Skew
- Timing Verification Algorithm
- Results
- Conclusion



Clock skew, as a fraction of the cycle time, is a growing problem for fast chips

- Fewer gate delays per cycle
- Poor transistor length, threshold tolerances
- Larger clock loads
- Bigger dice



Clock skew, as a fraction of the cycle time, is a growing problem for fast chips

- Fewer gate delays per cycle
- Poor transistor length, threshold tolerances
- Larger clock loads
- Bigger dice

The designer may:

Reduce skew

Very hard; clock networks are already well optimized



Clock skew, as a fraction of the cycle time, is a growing problem for fast chips

- Fewer gate delays per cycle
- Poor transistor length, threshold tolerances
- Larger clock loads
- Bigger dice

The designer may:

• Reduce skew

Very hard; clock networks are already well optimized

• Tolerate skew

Flip-flops and traditional domino circuits reduce cycle time by skew Latches and skew-tolerant domino can hide modest amounts of skew



Clock skew, as a fraction of the cycle time, is a growing problem for fast chips

- Fewer gate delays per cycle
- Poor transistor length, threshold tolerances
- Larger clock loads
- Bigger dice

The designer may:

Reduce skew

Very hard; clock networks are already well optimized

• Tolerate skew

Flip-flops and traditional domino circuits reduce cycle time by skew Latches and skew-tolerant domino can hide modest amounts of skew

• Only budget necessary skews

Skew between nearby latches is often much less than skew across die Need better timing analysis for different skews between different latches



# **Timing Analysis Formulation**

Build on Sakallah, Mudge, Olukotun (SMO) analysis of latch-based systems. System contains:

- $k \operatorname{clocks} C = \{\phi_1, \phi_2, ..., \phi_k\}$
- *I* latches  $L = \{L_1, L_2, ..., L_l\}$















cycle time

duration for which  $\phi_i$  is high



**Timing Analysis with Clock Skew** 





cycle time

duration for which  $\phi_i$  is high

start time, relative to beginning of common clock, of  $\phi_i$  being high



cycle time



- duration for which  $\phi_i$  is high
- start time, relative to beginning of common clock, of  $\phi_i$  being high
- phase shift from  $\phi_i$  to next occurrence of  $\phi_j$ . Used to translate times relative to particular clock phases.













**Timing Analysis with Clock Skew** 







 $\Delta_{ij}: \qquad \text{propagation delay through logic between latches } i \text{ and } j$  $A_i: \qquad \text{arrival time at latch } i, \text{ relative to start of } p_i$ 



- propagation delay through logic between latches *i* and *j*
- $\Delta_{ij}$ :  $A_i$ : arrival time at latch *i*, relative to start of  $p_i$

 $D_i$ : departure time from latch *i* 



- $\Delta_{ij}$ : propagation delay through logic between latches *i* and *j*
- $A_i$ : arrival time at latch *i*, relative to start of  $p_i$
- $D_i$ : departure time from latch *i* 
  - output time of latch i

Timing Analysis with Clock Skew

 $Q_i$ :



Latch Departure:

 $\forall i \in L \qquad D_i = max(0, A_i)$ 



Timing Analysis with Clock Skew

Latch Departure:

$$\forall i \in L$$
  $D_i = max(0, A_i)$ 

Latch Output:

$$\forall i \in L \qquad Q_i = D_i + \Delta_{DQ_i}$$



Timing Analysis with Clock Skew

Latch Departure:

$$\forall i \in L \qquad D_i = max(0, A_i)$$

Latch Output:

$$\forall i \in L \qquad Q_i = D_i + \Delta_{DQ_i}$$

Latch Arrival:

$$\forall i, j \in L \qquad A_i = max(Q_j + \Delta_{ji} + S_{p_jp_i})$$



Timing Analysis with Clock Skew

Latch Departure:

$$\forall i \in L \qquad D_i = max(0, A_i)$$

Latch Output:

$$\forall i \in L \qquad Q_i = D_i + \Delta_{DQ_i}$$

Latch Arrival:

$$\forall i, j \in L \qquad A_i = max(Q_j + \Delta_{ji} + S_{p_jp_i})$$

Propagation Constraints:

$$\forall i, j \in L \qquad D_i = max(0, max(D_j + \Delta_{DQ_j} + \Delta_{ji} + S_{p_jp_i}))$$



**Timing Analysis with Clock Skew** 

Latch Departure:

$$\forall i \in L \qquad D_i = max(0, A_i)$$

Latch Output:

$$\forall i \in L \qquad Q_i = D_i + \Delta_{DQ_i}$$

Latch Arrival:

$$\forall i, j \in L \qquad A_i = max(Q_j + \Delta_{ji} + S_{p_jp_i})$$

Propagation Constraints:

$$\forall i, j \in L \qquad D_i = max(0, max(D_j + \Delta_{DQ_j} + \Delta_{ji} + S_{p_jp_i}))$$

Setup Constraints:

$$\forall i \in L \qquad D_i + \Delta_{DC_i} \leq T_{p_i}$$

Timing Analysis with Clock Skew

# **Timing Analysis with Clock Skew**

Clock skew is the difference between nominal and actual interarrival times of a pair of clocks.

Enlarge set of physical clocks *C* to model skew between nominally identical clocks. Example:



# **Single Skew Formulation**

Easy and conservative to budget global skew everywhere

Effectively increases setup time at each latch

Setup Constraints:

$$\forall i \in L$$
  $D_i + \Delta_{DC_i} + t_{skew}^{global} \leq T_{p_i}$ 

Too conservative for high-speed designs with big global skews



**Timing Analysis with Clock Skew** 

#### **Exact Skew Budgets** How much skew must be budgeted? local skew • $L_3$ to $L_4$ : $D_3$ ¢2а – -3 $\Delta 6$ $\Delta 4$ $D_6$ $D_4$ ф1b-¢1а--4 6 Δ5 $\Delta 7$ $D_5$ $D_7$ ¢2а-Ф<u>2</u>b- $L_5$ -7 ALU (clock domain a) Data Cache (clock domain b)

**Timing Analysis with Clock Skew** 



#### **Exact Skew Budgets**

How much skew must be budgeted?

- *L*<sub>3</sub> to *L*<sub>4</sub>:
- *L*<sub>7</sub> to *L*<sub>4</sub>:

local skew global skew



**Timing Analysis with Clock Skew** 

## **Exact Skew Budgets**

local skew

global skew

How much skew must be budgeted?

- $L_3$  to  $L_4$ :
- *L*<sub>7</sub> to *L*<sub>4</sub>:
- $L_5$  to  $L_4$  through transparent  $L_6$ ,  $L_7$ : local skew

Must track launching clock to determine skew budget



**Timing Analysis with Clock Skew** 



# **Exact Skew Formulation**

Define arrival and departure times with respect to launching clocks:

- $A_i^c$ : arrival time at latch *i* for path launched by clock *c* 
  - departure time from latch *i* for path launched by clock c

 $t_{skew}^{\phi_{j},\phi_{j}}$ :

 $D_i^C$ :

: skew between clocks  $\phi_i$ ,  $\phi_j$ 



# **Negative Departure Times**

Must now allow negative departure times with respect to other clocks:

- Path from  $L_5$  to  $L_7$  is earlier than  $L_6$  to  $L_7$ , but sees more skew, miss setup
- Reaches  $L_6$  at -50 ps, but  $L_6$  may be transparent by then because of skew

Departure times w.r.t. latch's own clock still must be nonnegative



**Timing Analysis with Clock Skew** 

#### **Exact Constraints with Skew:**

Propagation Constraints (single skew):

$$\forall i, j \in L \qquad D_i = max(0, max(D_j + \Delta_{DQ_i} + \Delta_{ji} + S_{p_ip_i}))$$

Setup Constraints (single skew):

$$\forall i \in L$$
  $D_i + \Delta_{DC_i} + t_{skew}^{global} \leq T_{p_i}$ 

Propagation Constraints (exact skew):  $\forall i, j \in L, c \in C$  if  $c = p_i$ then  $D_i^c = max(0, max(D_j^c + \Delta_{DQ_j} + \Delta_{ji} + S_{p_jp_i}))$ else  $D_i^c = max(D_j^c + \Delta_{DQ_j} + \Delta_{ji} + S_{p_jp_i})$ 

Setup Constraints (exact skew):

$$\forall i \in L, c \in C$$
  $D_i^c + \Delta_{DC_i} + t_{skew}^{c, p_i} \leq T_{p_i}$ 

**Timing Analysis with Clock Skew** 

# **Other Timing Constraints**

Flip-flops:

- No transparency, easier than latches
- Still budget skew between launching and receiving clocks

Min-delay:

- Only requires checks between consecutive pairs of clocked elements
- Standard verification algorithms work if proper skew is used



# **Verification Algorithm**

Check constraints with generalized Szymanski-Shenoy relaxation algorithm

1 For each latch *i*:  
2 
$$D_i^{p_i} = 0$$
;  $D_i^{max} = 0$ ;  $c_i^{max} = p_i$  // initialize departure times  
3 Enqueue  $D_i^{p_i}$   
4 While queue is not empty  
5 Dequeue  $D_j^c$   
6 For each latch *i* in fanout of *j*  
7  $A = D_j^c + \Delta_{DQj} + \Delta_{ji} + S_{p,p_i}$  // calculate arrival time  
8 If  $(A > D_i^c)$  AND  $(A + t_{skew}^{cmax} > D_i^{max})$  // is it possibly critical?  
9 If  $(A + \Delta_{DCi} + t_{skew}^{c,p_i} > T_{p_i})$  // does it violate setup time?  
10 Report setup time violation  
11 Else  
12  $D_i^c = A$ ; Enqueue  $D_i^c$  // keep following path  
13 If  $(A > D_i^{max}) D_i^{max} = A$ ;  $c_i^{max} = c$ 

Timing Analysis with Clock Skew

# **Results**

Analyzed MAGIC: Memory & General Interconnect Controller of FLASH supercomputer

Assume  $t_{skew}^{local} = 250 ps t_{skew}^{global} = 500 ps$ 

Model A:

• As designed, from MAGIC .sdf database

Model B:

• Flops converted to latch pairs, logic balanced between pairs



# **Results**

Analyzed MAGIC: Memory & General Interconnect Controller of FLASH supercomputer

Assume  $t_{skew}^{local} = 250 ps t_{skew}^{global} = 500 ps$ 

Model A:

• As designed, from MAGIC .sdf database

Model B:

• Flops converted to latch pairs, logic balanced between pairs

|              |                            | Model A | Model B |
|--------------|----------------------------|---------|---------|
| # Flip-Flops |                            | 10559   | 0       |
| # Latches    |                            | 1819    | 22937   |
| Single Skew  | T <sub>c</sub>             | 9.43 ns | 8.05 ns |
|              | # Latch Departures Checked | 3866    | 24995   |
| Exact Skew   | T <sub>c</sub>             | 9.38    | 7.96    |
|              | # Latch Departures Checked | 4009    | 25328   |

CPU time < 1 second in all cases



**Timing Analysis with Clock Skew** 

# Conclusions

Global skews will be too large for GHz + systems

- Use skew-tolerant circuit techniques such as latches
- Take advantage of smaller local skews where possible

Requires support of timing analyzer

- Budget appropriate skew at each receiver
- Track departure times with respect to launching clocks
- Allow negative departure times with respect to other clocks

Leads to explosion in number of timing constraints. However...

- Most are not tight because most critical paths do not borrow time across many latches
- Relaxation algorithm automatically prunes loose constraints
- Very small increase in runtime

Expect synchronous systems well beyond 1 GHz

