A Sub-Picosecond Hybrid DLL for Large-Scale Phased Array Synchronization

Matan Gal-Katziri and Ali Hajimiri
Department of Electrical Engineering
California Institute of Technology
Pasadena, CA 91125, USA
Email: mgal@caltech.edu

Abstract—A large-scale timing synchronization scheme for scalable phased arrays is presented. This approach utilizes a DLL co-designed with a subsequent 2.5GHz PLL. The DLL employs a low noise, fine/coarse delay tuning to reduce the in-band rms jitter to 323fs, an order of magnitude improvement over previous works at similar frequencies. The DLL was fabricated in a 65nm bulk CMOS process and was characterized from 27MHz to 270MHz. It consumes up to 3.3mW from a 1V power supply and has a small footprint of 0.036mm$^2$.

Keywords—CMOS integrated circuits, phased-arrays, radio frequency, tracking loops, delay-lines, phase locked loops, phase noise.

I. INTRODUCTION

Phased arrays are extensively used in radar, sensing, and communication systems due to their electronic beam steering capabilities combined with the added directivity and enhanced SNR/SIR which scale with number of array elements [1]. Example applications are 5G networks, which are currently targeting hundreds to thousands of elements, and very large-scale arrays, sometimes referred to as million-element arrays [2]. The practical implementation of such systems necessitates a broad range of architectural and technological innovations, such as scalable structures and highly-integrated silicon-based RFICs [2-5]. In this scalable array transceiver architecture, a single low-frequency reference clock is distributed to identical blocks (tiles), where high-frequency signals are synthesized (often using an integrated PLL-based frequency synthesizer) and used for coherent RF signal generation and reception in concert with the other array elements, as shown in Fig. 1.

Fig. 1. Clock distribution to CMOS driven phased array

One major challenge with this architecture is maintaining the timing accuracy of the reference signal in the distribution process. A central star or H-tree distribution is impractical in the case of a large scale array as the number of traces and the electrical load of all the driven elements become prohibitively large. On the other hand, sequential buffering of the reference suffers from large accumulated timing deviations due to variations in the supply, temperature, and the driven load. These challenges can be mitigated utilizing a delay-locked loop (DLL) in the repeater buffer. While fundamentally sound, this approach presents new challenges since the low reference frequency, usually a few tens of MHz, necessitates a relatively large delay which can lead to unacceptable timing jitter. We propose a hybrid DLL architecture that utilizes several noise reduction techniques as well as a novel semi-digital loop control scheme with a single phase detection path. Moreover, the co-design of the DLL with the subsequent PLL-based synthesizer is exploited to further reduce the overall timing jitter by proper alignment of the two phase noise transfer functions; one loop provides rejection over the frequencies where the other has a large noise contribution. This approach opens the design space, leading to superior overall performance.

II. HYBRID DLL

In a DLL, the output signal must practically be delayed by at least half a clock period compared to the reference in order to correct both negative and positive timing errors. A standard implementation does so with a single continuous delay line, which is usually the main noise contributor due to the large delay range it needs to cover. A hybrid DLL can solve this problem by using two different sets of delay elements, as shown in Fig. 2. A digitally controlled delay line (DCDL) composed of low-noise fixed-delay elements is used for coarse delay tuning, while a short, continuously variable delay line (VDL) is used to fine tune within the digital segments.

In order to achieve delay lock, we use an analog DLL architecture and continuously monitor its charge pump (CP) output control voltage (Vc) to adjust the required DCDL value.

This work was sponsored by Caltech’s Space Solar Power Project (SSPP).
Initially, the up/down counter of Fig. 3b is set to fix the DCDL state, and the DLL loop of Fig. 2 continuously controls the VDL. If an unattainable VDL tuning value is required, the control voltage Vc will rail, crossing some lower or upper threshold along the way. This activates the overflow detector of Fig. 3a to pause the continuous control loop, initiate a single increase/decrease of a DCDL cell, and restart VDL tracking. Unlike [6], we are not changing the continuous delay range by flipping a state machine to set discrete phase states but are instead adding or removing fixed amount of low noise delay as required. This significantly improves the noise performance. In addition, we are tracking the same edge in a monotonous, continuous, and overlapping manner—which, when combined with the fact that the DLL is a first-order control loop, guarantees its stability. The reset circuitry in Fig. 3c is crucial to temporarily disable the phase detector and force Vc to mid-supply when a DCDL shift occurs and is synchronized such that the phase detector starts at a consistent state once the VDL tracking restarts. The noise-optimized, pseudo-differential delay elements of Fig. 4 also allow tracking of the falling edge of the output clock, which effectively reduces the minimum delay required by $T/2$ and enables usage of the same delay line at lower reference frequencies.

This architecture offers enhanced robustness because (1) it necessitates neither lock detect indication nor dual phase detection circuitry as in [7][8], (2) the small signal gain is identical for all DCDL values, and (3) the DCDL state changes in single up/down steps. The latter indicates that subsequent VDL tracking starts from a well-defined, nearby position, unlike a digital controller with automatic delay step adjustment. Our implementation favors clock distribution applications where lock time is not a major consideration. If necessary, fast lock is achievable with an a priori estimate of the DCDL delay step values and external programming of the up/down counter state.

III. PLL DESIGN

The hybrid DLL was co-designed with its intended load PLL (Fig. 5) for a 50MHz clock distribution of an existing RF phased array application. The PLL itself is fully integrated and operates at an output frequency of 2.5GHz with a loop bandwidth of 1MHz. It contains a mechanism similar to [9] to reduce its reference spurs, which, when present at the output of a large-scale transmitter array, might become a significant spectral disturbance. In order to minimize the DLL in-band noise, its loop filter bandwidth was optimized to be around 1MHz in order to sufficiently reject the delay line noise while maintaining a relatively flat noise shape around the PLL loop filter knee frequency.

IV. MEASUREMENT RESULTS

Both the DLL and the PLL were fabricated in a 65nm bulk CMOS process (Fig. 6). They occupy 0.036mm$^2$ and 0.4mm$^2$ of active area, respectively, and their joint operation was characterized at an output frequency of 2.5GHz with the input reference ranging from 27MHz-270MHz.
Fig. 6. Die micrographs of the (a) DLL and (b) the driven PLL.

Fig. 7 shows the delay locking mechanism while the DLL drives either 50Ω or 10pF loads. The control voltage $V_c$ in Fig. 7a overflows and resets until it reaches the necessary DCDL value, while fine-tuning persists indefinitely. The delay between the output and reference signals (Fig. 7b) was calculated from the waveforms’ zero-crossing points, emphasizing how proper sizing of the overlapping DCDL step size and VDL range allow for proper operation of the circuit.

![Fig. 7](image)

Fig. 7. Hybrid lock process at different time scales. (a) Loop filter control voltage, (b) time delay between reference and output clocks, and (c) time domain waveforms (adjusted)

In our phased-array application, the expected temperature fluctuation is less than 10°C in steady state, and the measured closed-loop control voltage tracks the temperature at a rate of 2.4mV/°C. The nominal control voltages for locking are 340mV and 660mV when counting up and down, respectively, and the overflow detector has a nominal hysteresis of 30mV. Therefore, temperature variations are not expected to toggle the digital counter and add additional, unaccounted noise. In our clock distribution scheme, static buffer phase offset is programmatically removed when the array is calibrated and therefore not of a major concern.

![Fig. 8](image)

Fig. 8. (a) Phase noise test setup. Blocks are taken off when not measured. (b) 50MHz DLL phase noise and rms jitter (c) 2.5GHz post PLL phase noise and rms jitter. The red curves are the rms measurement uncertainty.

Figs. 8b and 8c clearly show how the PLL loop filter rejects most of the DLL noise and thus brings it to contribute as little as 323fs rms jitter in the relevant frequency band.

![Fig. 9](image)

Fig. 9. Performance vs. frequency. (a) Participating DCDL cell count, (b) power consumption of participating DLL blocks, and (c) rms jitter within the 1kHz - 10MHz band.

These measurements were repeated at different frequencies between 27MHz and 270MHz, and a summary is shown in Fig. 9. The lower and upper frequency ranges are limited by the maximum DCDL delay and overflow actuation timing accuracy, respectively. Fig. 9b demonstrates how this DLL is...
advantageous in that an increase in the frequency of operation decreases the number of DCDL elements that participate in the delay chain, and thus the power consumption remains roughly constant. Fig 9c emphasizes how the system is optimized for 50MHz operation. At lower frequencies, the high DCDL count adds more noise to the output, while at higher frequencies the subsequent PLL loop filter has little effect on noise rejection.

Because the end goal is the phased array reference distribution scheme of Fig. 1, noise performance was characterized for several, cascaded DLLs. If the noise of each stage is uncorrelated with the others, the total noise measured at the output of an N DLL cascade is expected to be:

\[ n_{total}^2 = n_{ref}^2 + n_{meas}^2 + Nn_{DLL}^2 \]  

(1)

where \(n_{meas}\) is the measuring instrument noise, \(n_{ref}\) is the reference noise, and the single device noise can be estimated from the slope of the linear fit. Fig. 10 shows the linear behaviour of the DLL cascade jitter variance at different frequencies and the resulting rms jitter is summarized in Table I, showing good agreement with the single device measurements of Figs. 8b and 9c.

![Diagram of cascaded DLL setup](image)

Fig. 10. Cascaded DLL jitter (a) test setup, (b) 1kHz - 10MHz measurement, and (c) 1kHz-ref/2 measurement. Red and blue curves indicate locking to inverted/non-inverted output, respectively.

### TABLE II

**PERFORMANCE SUMMARY AND COMPARISON TO PRIOR WORK**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Comparison frequency</td>
<td>50MHz</td>
<td>27-270MHz</td>
<td>250MHz</td>
<td>200MHz</td>
<td>50MHz</td>
</tr>
<tr>
<td>Comparison frequency</td>
<td>27-270MHz</td>
<td>27-270MHz</td>
<td>270MHz</td>
<td>200MHz</td>
<td>180MHz</td>
</tr>
<tr>
<td>RMS jitter [ps]</td>
<td>0.685</td>
<td>0.25-2GHz</td>
<td>5.25</td>
<td>4.44</td>
<td>7 (approx.)</td>
</tr>
<tr>
<td>RMS jitter [ps]</td>
<td>0.55</td>
<td>0.25-2GHz</td>
<td>250MHz</td>
<td>200MHz</td>
<td>2.3</td>
</tr>
<tr>
<td>In band RMS jitter</td>
<td>0.33ps</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
</tr>
<tr>
<td>Power consumption [mW]</td>
<td>2.25</td>
<td>0.26ps</td>
<td>NA</td>
<td>NA</td>
<td>26</td>
</tr>
<tr>
<td>Supply voltage [V]</td>
<td>1</td>
<td>1.8</td>
<td>1.2</td>
<td>15 (320MHz)</td>
<td>0.4-3.6</td>
</tr>
<tr>
<td>Technology process [CMOS]</td>
<td>65nm</td>
<td>130nm</td>
<td>90nm</td>
<td>250nm</td>
<td>26</td>
</tr>
<tr>
<td>Die area [mm²]</td>
<td>0.036</td>
<td>0.046</td>
<td>0.07</td>
<td>0.005</td>
<td>0.08</td>
</tr>
</tbody>
</table>

### V. CONCLUSIONS

The task of distributing a low noise reference to very large-scale phased arrays is challenging because it does not enjoy the shorter period times of GHz range clocks. Table II shows a performance comparison of the hybrid DLL/PLL scheme with prior art at similar frequency ranges, and demonstrates how combining new circuit architectures with application-aware design can result in an order-of-magnitude improvement over the state-of-the-art.

### REFERENCES