# A PROBABILISTIC MODEL FOR CLOCK SKEW Steven D. Kugelmass Kenneth Steiglitz CS-TR-124-87 November 1987 ### A Probabilistic Model For Clock Skew<sup>†</sup> Steven D. Kugelmass Kenneth Steiglitz Department of Computer Science Princeton University Princeton, NJ 08544 #### **ABSTRACT** A probabilistic model for the accumulation of clock skew in synchronous systems in presented. Using this model, we derive upper bounds for expected skew, and its variance, in tree distribution systems with N synchronously clocked processing elements. We apply these results to two specific models for clock distribution. In the first, which we call *metric-free*, the skew in a buffer stage is Gaussian with a variance independent of wire length. In this case the upper bound on skew grows as $\Theta(\log N)$ for a system with N processing elements. The second, *metric*, model, is intended to reflect VLSI constraints: the clock skew in a stage is Gaussian with a variance proportional to wire length, and the distribution tree is an H-tree embedded in the plane. In this case the upper bound on expected skew is $\Theta(\sqrt{N\log N})$ for a system with N processors. Thus the probabilistic model is more optimistic than the deterministic summation model of Fisher and Kung, which predicts a clock skew $\Theta(N)$ in this case, and is also consistent with their lower bound of $\Omega(\sqrt{N})$ for planar embeddings. We have estimates of the constants of proportionality, as well as the asymptotic behavior, and we have verified the accuracy of our estimates by simulation. November 30, 1987 <sup>&</sup>lt;sup>†</sup>This work was supported by NSF Grant MIP-8705454, U. S. Army Research Office-Durham Contract DAAG29-85-K-0191, and DARPA Contract N00014-82-K-0549. ### A Probabilistic Model For Clock Skew<sup>†</sup> Steven D. Kugelmass Kenneth Steiglitz Department of Computer Science Princeton University Princeton, NJ 08544 ## 1. Introduction The accumulation of clock skew, the differences in arrival times of signals in a computing system with a central clock, is one of the factors that limit the speed of such systems. While some researchers have attempted to describe the underlying physical causes of skew [2,9], there is little published material describing models of skew accumulation and the asymptotic behavior of skew with increasing system size. Fisher and Kung in [4] present two models and derive bounds on clock skew under each. They assume, in each case, that the global clock is distributed via a rooted binary tree of buffers and wires. The first of their two models, the "difference model," specifies that the skew between two clocked nodes is proportional to the difference in the path lengths from each clocked node to their nearest common ancestor in the distribution tree. "Path length" means the actual geometric length, not the number of edge traversals in a representative graph. Under this model, both one and two-dimensional arrays can be clocked without skew by distributing the global signal via a topology that has equal path length to each clocked node. An H-tree is just one example of such a topology. The second model, the "summation model," is less optimistic. It states that the skew between two clocked nodes is proportional to the sum of the path lengths from each node to their nearest common ancestor. Using this model, Fisher and Kung were able to show that one-dimensional arrays of processors can be clocked with a constant amount of skew, whereas two-dimensional arrays cannot. They establish a lower bound of $\Omega(\sqrt{N})$ , and this model leads to a skew of $\Theta(N)$ for an H-tree. <sup>&</sup>lt;sup>†</sup>This work was supported by NSF Grant MIP-8705454, U. S. Army Research Office-Durham Contract DAAG29-85-K-0191, and DARPA Contract N00014-82-K-0549. Both of these models ignore what we consider to be a fundamental property of skew, namely, its roots in the random variations of propagation time through buffers and wires. Others [5] have developed equations that relate skew to system performance and stability, in terms of simple timing parameters such as the propagation time and the settling time of the components of the processing elements. The equations specify bounds on the skew for proper operation in terms of other system timing parameters. No attempt is made to address the nature of the origin of skew, nor to relate its rate of growth with system size. This paper presents and analyzes a probabilistic model for the accumulation of skew in a globally distributed signal. We determine upper bounds for the expected clock skew between processing elements in a processor array, and show that under the assumption that the delay is the same at each stage of the clock distribution tree, any array can be clocked with expected skew that is $O(\log N)$ where N is the number of processing elements. The organization of the papers is as follows. In Section 2 we present our formal model of global signal distribution and skew accumulation. Section 3 presents an analysis of the model and derives upper bounds for the expected skew in a global signal. Section 4 applies the results to two examples, and we conclude by comparing our results to previously published results and by discussing the implications of these upper bounds on the construction of large synchronous systems. # 2. A General Model of Signal Distribution A global signal, such as a clock, is distributed throughout a processing system by a signal distribution system. The distribution system is composed of a number of buffers (amplifiers) and wires which may be organized in a number of different ways. Two common structures are a bus and a tree. The clock distribution system can be represented by a graph. It has a single distinguished vertex which is called the *source*. This is where the origin of the global signal is located, and it is the only input to the distribution system. The distribution system can have multiple destinations, but for practical reasons, there is exactly one path from the source to each destination. We assume that a destination is a processing element (PE) which may have its own internal signal distribution system. Using the tools which are developed below, the internal system can be modeled in a similar manner to the global signal distribution system. For our purposes, however, we can think of it as hidden; we concern ourselves only with the global signal system. There is a second graph superimposed on the clock distribution graph. It is a *communication graph* which specifies the PE-to-PE connectivity. It can take any topology: linear list, mesh, complete, bipartite, etc, and forms connections only among the PEs. The PEs use the global clock to synchronize their communication along the communication graph. Each buffer and wire in the clock distribution system propagates and delays its incoming signal. Therefore, it is natural to associate every delay element in the signal distribution system, whether a buffer or a wire, with a real random variable, $d_j$ . The value of the random variable gives the delay contribution of that element. The delay at any stage of the clock distribution system is the sum of the delays from the signal source to that point. An equivalent process takes place in the model. The delay at any point is a real random variable which is the sum of all the random variables along the path from the source to that point. These definitions constitute the essence of the model. They make it very simple, but extremely general, and allow one to model any clock distribution system. Geometry can be incorporated into the model by attaching an appropriate probability density function to wire delays. There is also the freedom to analyze as much or as little as desired by creating simplified models, in which buffer delays or wire delays can be ignored entirely. Our primary interest is skew, the distribution of arrival times of a particular clock pulse to all of the PEs that communicate. Since the model is probabilistic, it is not possible to give an expression for the worst-case skew. Rather, we derive an expression for the expected maximum skew by assuming that the two PEs which exhibit the largest possible clock skew do indeed communicate. So far, we have made no mention of any particular probability density functions. The total delay through the distribution system, i.e., the arrival time of a clock pulse to a PE, is the sum of a number of random variables. In many cases, it quickly converges to a normal distribution, by the Central Limit Theorem [1, 3, 8]. Thus, in the case of the skew computations, the actual distributions attached to buffers and wires are usually relatively unimportant. The arrival times of a signal to the N PEs constitute a random sample of size N. From this sample, find the difference between the largest of them, $A_{\text{max}}$ , and smallest, $A_{\min}$ . The random variable $R = A_{\max} - A_{\min}$ is called the *range* of the sample. The range is equivalent to the skew in the signal distribution system. The maximum clocking frequency clearly can be no faster than 1/R. ## 3. Analysis and Upper Bounds The literature contains techniques to compute the expected range of a set of independent identically distributed (iid) random variables. However, little is described about the case when the variables are dependent, as they are in the clock tree. Fortunately, it is possible to use the statistics of iid variates as an upper bound on the statistics of the dependent variates of the clock tree. The relationship is given by Theorem 1, which can be paraphrased in the following informal way: The expected range of a set of random variables, which are dependent because they are the sums of overlapping variables, is no greater than the expected range of the corresponding set of independent random variables. Let $y_i$ , i = 1, 2, ..., N be independent identically distributed (iid) real random variables, with N = kn, and let the sets $\sigma_j$ , j = 1, 2, ..., n be n disjoint subsets of distinct $y_i$ , each of cardinality k. Let the $\tau_i$ be similarly defined, with k distinct elements each, except that they are not necessarily pairwise disjoint. Define the corresponding sums of the $y_i$ by $$s_j = \sum_{y \in \sigma_j} y$$ and $$t_j = \sum_{y \in \tau_j} y$$ We want to show that the expected range of the $s_j$ dominates the expected range of the $t_j$ , and so any upper bound for the former also holds for the latter. First we need two lemmas. **Lemma 1.** Let $F_A(x)$ and $F_B(x)$ be probability distribution functions for random variables A and B respectively, and suppose further that $F_A$ and $F_B$ are differentiable and A and B have finite means and variances. If $F_A(x) \ge F_B(x)$ for all x, then $E(A) \le E(B)$ . **Proof.** Integrating $x \, dF(x)$ by parts from a to b for $F_B$ and $F_A$ and subtracting, we get $$E(B) - E(A) = \lim_{\substack{a \to -\infty \\ b \to +\infty}} \left[ b \Delta F(b) - a \Delta F(a) - \int_{a}^{b} \Delta F(x) \, dx \right]$$ where $\Delta F = F_B - F_A \le 0$ . To show that the first two terms go to zero as a and b approach $-\infty$ and $+\infty$ respectively, we apply l'Hospital's Rule: $$\frac{\frac{d}{dq} \left[ \Delta F(q) \right]}{\frac{d}{dq} \left[ \frac{1}{q} \right]} = \frac{\Delta F'(q)}{-\frac{1}{q^2}} = -q^2 \Delta F'(q)$$ $F_A$ and $F_B$ are differentiable and have finite variances, so the integral $$\int_{-\infty}^{+\infty} x^2 F'(x) \, dx \tag{*}$$ is finite. This implies that the integrand of (\*) must go to zero as x approaches $+\infty$ and $-\infty$ . Therefore $$E(B) - E(A) = -\int_{-\infty}^{+\infty} \Delta F(x) dx \ge 0$$ The next lemma expresses the intuition that the pre-condition $y \le \beta$ can only make $y \le \alpha$ more probable, no matter what $\alpha$ and $\beta$ are. **Lemma 2.** For any $\alpha$ and $\beta$ , and any continuous probability density P, $$\mathrm{P}(y \leq \alpha \mid y \leq \beta) \, \geq \, \mathrm{P}(y \leq \alpha)$$ Proof. For convenience write $$P(y \le \alpha) = G(\alpha)$$ $$P(y \le \alpha \mid y \le \beta) = G(\alpha \mid \beta)$$ $$P(y \le \alpha, y \le \beta) = G(\alpha, \beta)$$ so that we want to show $$G(\alpha \mid \beta) = \frac{G(\alpha, \beta)}{G(\beta)} \ge G(\alpha)$$ First, consider the case $\alpha < \beta$ . Since $$G(\alpha, \beta) = G(\min[\alpha, \beta])$$ we have $$G(\alpha \mid \beta) = \frac{G(\alpha)}{G(\beta)} \ge G(\alpha)$$ When $\alpha > \beta$ we have simply $$G(\alpha \mid \beta) = \frac{G(\beta)}{G(\beta)} = 1 \ge G(\alpha)$$ Lemma 2 is easily generalized for any number of conditions to $$G(\alpha \mid \beta, \gamma, \cdots) \ge G(\alpha)$$ (\*\*) and in fact to the probability distribution of sums of variables, conditioned on other sums, so long as the inequalities in the conditions all go the same way. That is, write any $\sum y_i \le x$ as $y_k \le x - \sum_{i \ne k} y_i$ for some k and apply Lemma 2. There are no restrictions on $\alpha$ and $\beta$ . **Theorem 1.** The expected range of the $s_j$ is no smaller than the expected range of the $t_j$ . **Proof.** Note first that because E[ range ] = E[ max - min ] = E[ max ] - E[ min ], it suffices to show that the expected max of the $s_j$ cannot be less than the expected max of the $t_j$ ; the appropriate inequality for the min follows by a symmetric argument. Let S and T be the maximum of the sums $s_j$ , j = 1, 2, ..., n and $t_j$ , j = 1, 2, ..., n respectively, and $F_S$ and $F_T$ their distribution functions. By Lemma 1, it suffices to show that $F_T(x) \ge F_S(x)$ for every x. To simplify notation, let $S_i$ denote the event $s_i = \sum_{y \in \sigma_i} y \le x$ and $T_i$ the event $t_i = \sum_{y \in \tau_i} y \le x$ . The distribution function $F_S(x)$ can then be written as $$F_S(x) = P(S_1, S_2, ..., S_n)$$ = $P(S_1) P(S_2) \cdots P(S_n)$ because the $S_i$ are independent. The distribution of *T* is $$F_T(x) = P(T_1, T_2, \dots, T_n)$$ $$= P(T_1)P(T_2 \mid T_1)P(T_3 \mid T_1, T_2)P(T_4 \mid T_1, T_2, T_3),...$$ by iterated Bayes' rule. Each factor in this product is $$P(t_i \le x \mid t_1 \le x, t_2 \le x, \cdots, t_{i-1} \le x)$$ All the conditioning is of the form $\Sigma y \leq x$ and so we can apply the generalized version of Lemma 2 to show that each factor is $\geq P(T_i)$ . The result $F_T \geq F_S$ follows from the fact that the $s_i$ and $t_i$ are identically distributed, each being the sum of k iid variables. $\square$ Now that we have established this theorem, any bound that we show for iid variates is an upper bound for the variates that arise in the clock distribution system model. At this point we are going to assume that the arrival times are Gaussian, motivated by the Central Limit Theorem. Although no closed-form expression is known for the expected value and the variance of the range of N iid Gaussian distributed random variables, it is possible to obtain asymptotic expressions. We present these as Theorem 2, the proof of which is due to Cramér. **Theorem 2.** [1] Let $x_i$ , i = 1,...,N be random samples from an $n(\mu, \sigma^2)$ distribution, and let $R = x_{\text{max}} - x_{\text{min}}$ be the difference of the largest and smallest $x_i$ . Then the expected value of R is asymptotically: $$E[R] = \sigma \left[ \frac{4\log N - \log\log N - \log 4\pi + 2C}{(2\log N)^{\frac{1}{2}}} + O\left[\frac{1}{\log N}\right] \right]$$ (1) where $C \approx 0.5772...$ is Euler's constant. The variance of R is given by $$Var[R] = \frac{\sigma^2}{\log N} \frac{\pi^2}{6} + O\left[\frac{1}{\log^2 N}\right]$$ (2) Equation (1) is therefore an asymptotic upper bound on the expected skew in a clock distribution tree with N leaves. Note that $\sigma$ will in general be a function of N. ## 4. Examples We now analyze two examples of global signal distribution systems. This is meant to clarify how a model is developed for a real problem. The examples represent what we consider to be common, typical clock distribution systems, but they are not intended to represent the full scope of all possibilities. We use the notation $n(\mu, \sigma^2)$ to denote a normal distribution with mean $\mu$ and variance $\sigma^2$ . ## 4.1. Metric-Free Tree The first example is a *metric-free* tree. This type of topology could be used to implement a large-scale distribution system which would provide a clock to chips on a board or to boards in a system. It does not constrain the circuit to be planar, so it is possible to equalize the lengths of all wires in the tree. Therefore, every wire has the same probability distribution for delay, which can be lumped with the delay of the buffer that follows it. This results in a model of a tree of buffers without wires. Assume that the global clock signal is distributed via a binary tree of buffers and wires. The root of the tree is the source of the signal, the PEs are placed at the leaves, and the intervening levels consist of buffers and wires. See Figure 1 for a schematic representation of the tree. Figure 1: Clock Distribution Tree In the figure, internal nodes, represented by small circles, are buffers which retransmit the clock signal. The leaf nodes of the tree, represented by squares, are the PEs that perform the actual computation and that communicate among themselves. Lines connecting nodes of the clock tree represent wires that conduct the clock signal to all the PEs. Let the delay through a buffer of the clock distribution tree be a real random variable, $d_i$ . The arrival time of a clock signal to any PE is the sum of the delays along the path from the root of the tree to the PE. Remember that all the delay is caused by the buffers; the effect of a wire is absorbed by the effect of the buffer that follows it. The arrival time at leaf i, $A_i$ , is therefore the random variable $$A_i = \sum d_j$$ where the $d_j$ lie on the path from the root to i. In order to apply Theorem 2, we must estimate the underlying distribution of the $A_i$ . Assuming that there are N PEs, each $A_i$ is the sum of $\log N d_j$ 's, and that each $d_j$ has variance $\sigma_b^2$ . By our Gaussian assumption, the $A_i$ have the distribution $n(\mu_b \log N, \sigma_b^2 \log N)$ . Applying Theorem 2 with this distribution, we find that the expected skew is $$E[Skew] = \sigma_b \sqrt{\log N} \left[ \frac{4\log N - \log\log N - \log 4\pi + 2C}{(2\log N)^{\frac{1}{2}}} + O\left[\frac{1}{\log N}\right] \right]$$ $$= \sigma_b \frac{4}{\sqrt{2}} \log N + \text{lower order terms}$$ $$= \Theta(\log N)$$ (3) The variance of the skew is Var[skew] = $$\sigma_b^2 \frac{\pi^2}{6} + O\left[\frac{1}{\log^2 N}\right]$$ which goes to a constant as $N \to \infty$ . Computer simulations corroborate the asymptotic skew results. Figure 2 shows the asymptotic curve, Equation (3), along with data gathered from simulation of the range of arrival times in a binary clock tree and the range of N iid $n(0,\log N)$ random variables. Both simulations present the average behavior from 100 trials. The simulations confirm that the range of the iid variates is an upper bound for the range of the dependent variables of the clock distribution tree, as predicted by Theorem 1. To check the result when the component delays are not Gaussian, we simulated a clock tree where the buffer delays were uniform over [-.5,+.5]. These data were compared to the results from a similar computation when the buffer delay was normally distributed with mean 0 and variance 1/12 (The sum of k uniform [-.5,+.5] variates converges to n(0,k/12)). Figure 3 presents the results. Even for trees of small depth, the expected range in both cases is nearly identical. This is due to the rapid convergence of the sums of random variables to a normal distribution. Figure 2: Range / Skew of Random Variables (Metric Free) Figure 3: Skew in Tree with Uniform and Gaussian Delay The explicit inclusion of wire delays into the model does not significantly alter the results. Wires can be considered to contribute an additional random delay at each level of the tree. Assuming that wire delays are distributed similarly to the buffer delays, the effect is to increase the variance of the distribution of arrival times by a constant factor. This does not alter the asymptotic behavior of skew. # 4.2. Metric Tree The second example is a *metric* tree. This is the type of system that is discussed in [4] and is typical of systems described for VLSI. The central assumption of this topology is that the circuit must be embedded in the plane. If the embedding is to be area-efficient [7], then the wires that connect buffers cannot be the same length everywhere. The delay through a wire therefore depends on the its location in the tree, and cannot be lumped with a buffer delay. A common tree of this type is the H-tree [6]. We choose to model it here because it is well known, it is feasible to implement in VLSI, and it the focus of the analysis in [4] so we may compare our results with theirs. There are two distinct views of the effects of increasing system size (number of PEs) under the metric assumption. The first is to assume that a tree with an arbitrary number of leaves can be embedded in the fixed area of the integrated circuit. The alternative to this view sets a lower limit on the size of the smallest feature; in this case the size of a wire at the tree's leaves. Each preceding level is progressively larger and the area of the entire clock tree grows with increasing system size. This view ignores the effect of shrinking feature size but is compatible with increases in chip die size. We will adapt the second model, as did Fisher and Kung. Refer again to Figure 1, keeping in mind that in the metric case the wires are not all of the same length. Assume that every buffer delay, $d_j$ , is $n(\mu_b, \sigma_b^2)$ . For this analysis, we will also assume that a wire delay, $w_j$ , is Gaussian distributed with a mean value and a variance proportional to its length. The linear relationship for the variance can be justified by considering a long wire to be equivalent to two shorter wires placed end to end. The propagation delay of the long wire is equal to the sum of the propagation delays of the two shorter wires. The expected values add, as do the variances because the delays of the short wires are independent. The wire delay at the leaves of the tree is $n(\mu_w, \sigma_w^2)$ distributed. Because wire length doubles at each higher level of the tree, the distribution of $w_j$ can be written as a function of the level number, d, of the wire. We find that $w_j$ is $n(\mu_w \frac{N}{2^d}, \sigma_w^2 \frac{N}{2^d})$ where $1 \le d \le \log N$ . The total delay, $A_i$ , is the sum of logN buffer delays and the sum of wire delays from each level of the tree. The wire delays form the geometric series $(1+2+4+\cdots+\frac{N}{2})=N-1$ . The total delay, $A_i$ , therefore has the distribution $n(\mu_b \log N + \mu_w(N-1), \sigma_b^2 \log N + \sigma_w^2(N-1))$ . As $N \to \infty$ , the linear (wire) term dominates. The expected skew is therefore $$E[skew] = \sigma_w \sqrt{N} \left[ \frac{4 \log N - \log \log N - \log 4\pi + 2C}{\sqrt{2 \log N}} + O\left(\frac{1}{\log N}\right) \right]$$ $$= \sigma_w \frac{4}{\sqrt{2}} \sqrt{N \log N} + lower order terms$$ $$= \Theta(\sqrt{N \log N})$$ (4) and the variance is given by Var[skew] = $$\frac{\sigma_w^2 N}{\log N} \left[ \frac{\pi^2}{6} \right] + O \left[ \frac{1}{\log N} \right]$$ Figure 4 shows the results of computer simulations. The dotted curve is the asymptotic formula, Equation (4), the solid curve is the range of iid variates, and the dashed curve shows the range of tree-dependent variates. The simulated cases represent the average behavior from 100 trials. The discrepancy is greater in this case because the shared variates, representing deviation from the independence assumption, are near the root of the tree, where the wire lengths are longer. Thus, this bound is not very tight, and a more detailed analysis might improve it. ## 5. Discussion It is now possible to give an estimate on the probability that the sample value of the skew is outside a certain range. Assume that X is a random variable with mean $\mu$ and variance $\sigma^2$ . Then the one-sided Chebyshev inequality [8] yields an upper bound on the probability of exceeding the mean skew by an amount a: $$P(X > (\mu + a)) \le \frac{\sigma^2}{\sigma^2 + a^2}$$ Now let $a = \alpha \mu$ ; that is, $\alpha$ is the fractional deviation from the expected value of skew. Then, using our estimate, in both the metric and the metric-free case, we have an estimate of an upper bound as $N \rightarrow \infty$ : Figure 4: Range / Skew of Random Variables (Metric) $$P(X > (\mu + a)) \le \frac{\frac{\pi^2}{6}}{\frac{\pi^2}{6} + 8(\log N)^2 \alpha^2}$$ ## 6. Conclusions As long as VLSI offers a finite resource, large systems must be constructed from many chips. The clock distribution system can then be non-planar, since it is not restricted to a single VLSI integrated circuit. We therefore consider our metric-free results to be important in answering the question of the ultimate limit of synchronous computation. Our finding that skew grows as the logarithm of the system size encourages us to believe that very large synchronous systems are feasible. It appears that this estimate is new. It is informative to compare our results for the metric tree with those of Fisher and Kung. Their summation model yielded a lower bound of $\Omega(\sqrt{N})$ for the skew in a processor array with N PEs. The worst case performance of the H-tree under their summation model is $\Theta(N)$ , whereas we show that expected skew is only $\Theta(\sqrt{N\log N})$ . This places our result between their summation model bound and their graph-theoretic lower bound. # 7. Acknowledgements The authors wish to thank J.P. Singh for helpful discussions and literature survey [10]. ### References - H. Cramér, Mathematical Methods of Statistics, Princeton University Press, Princeton, NJ, 1946. - 2. R.L.M. Dang and N. Shigyo, "A two-dimensional simulation of LSI interconnect capacitance," *IEEE Electron Device Lett.*, vol. EDL-2, August 1981. - 3. W. Feller, An Introduction to Probability Theory and Its Applications, Volume II, John Wiley & Sons, New York, 1971. - 4. Allan L. Fisher and H. T. Kung, "Synchronizing Large VLSI Processor Arrays," *IEEE Transactions on Computers*, vol. c-34, no. 8, pp. 734-740, August 1985. - 5. Mehdi Hatamian, "Understanding Clock Skew in Synchronous Systems," 1987 Princeton Workshop on Algorithm, Architecture and Technology Issues in Models of Concurrent Computations. - 6. C. A. Mead and M. Rem, "Cost and Performance of VLSI Computing Structures," *IEEE J. Sold-State Circuits*, vol. SC-14, pp. 455-462, April 1979. - M. S. Paterson, W. L. Ruzzo, and L. Snyder, "Bounds on Minimax Edge Length for Complete Binary Trees," *Proc. Thirteenth Annual ACM Symp. Theory Comput.*, pp. 293-299, May 1981. - 8. Sheldon Ross, *A First Course in Probability*, Macmillan Publishing Company, New York, 1984. - K. C. Saraswat and F. Mohammadi, "Effect of scaling of interconnections on the time delay of VLSI circuits," *IEEE J. Solid State Circuits*, vol. SC-17, no. 2, April 1982. - 10. J. P. Singh, "Synchronization of Large VLSI Systems," Unpublished Manuscript, Dept. Electrical Engineering, Princeton University.