Foreword

In the previous post we had a look at write levelling. Write levelling is required on DDR3 because control signals to different memory chips are daisy chained. This means that the control signals travel different distances to different memory chips, which at high frequencies can lead to significant clock skew.

In DDR3 we try to compensate for this clock skew, by delaying the data strobe signal for each byte lane by a specific amount during reading and writing.

The process of determining the amount of delay required for the strobe signals is called write levelling and read levelling.

In this post we will be covering read levelling, still basing our design heavily on the Elphel Memory controller of the 10393.

Simplifying the plot

The more I read about read and write-levelling, the more I am convinced that it is a bit of a overkill to run these processes every time when you power up an Arty A7.

As I mentioned previously, these levelling processes is to account for the difference in distances the control signals need to travel to different memory chips.

However, the Arty A7 contains just one memory chip, and not several. With just that assumption alone, one doesn't need to worry about clock skew among different memory chips.

One might also argue, even though we only have one memory chip, it can be potentially far away from the FPGA, also causing annoying clock skew, which we need to correct with the levelling processes.

Looking at the layout of the Arty A7, we see that this is again not the case:

I got this image from Digilent's web site. Item number 19 is the FPGA and item number 20 is the RAM chip. Measuring the distance between the two components is about 1 centimetre. This distance is almost nothing compared to the distance between CPU's and RAM on other motherboards.

Also, I am clocking RAM at 333MHz, so if you strobe the data at a quarter of a clock cycle after the data was asserted, this would be more than enough to counter any clock skew due to the distance between the FPGA and the RAM chip.

From all my measurements so far, I also found that I could reliably read data from the RAM chip when I use a quarter cycle delay before doing the strobe.

If this assumption remains valid throughout my project, this will simplify my design greatly, in the sense that I don't need to add logic for running levelling upon powerup.

Having said that, putting the RAM chips in write levelling or read levelling mode is still useful for doing research, like this post.

Using the ISERDESE2 and OSERDESE2 blocks

I mentioned in the previous section that on the Arty A7 that the RAM chips are clocked at 333MHz. However, the FPGA on the Arty A7 is not able function properly at such high frequencies, so how do we solve this?

Most of you will know that this kind of problem is not a new one in the Computer world. Just think of the reason why most Disk Drives produced for the Vic-20 and Commodore 64 were so slow, compared to Disk Drives for other computers of the era.

Starting with the Vic-20, they replaced the parallel Disk interface, which was used in the PET's, with a serial one, meaning that this new port had to be clocked theoretically at 8 times the clock speed to get the same performance as a Disk Drive of the Pet. They intended to use the serial shift register on the 6522 to reduce the load on the CPU.

We all now the rest of the story. They couldn't use the shift register of the 6522, because it had a bug, so they had to resolve to bit banging, which is the reason for the slow Disk load speeds. The CPU is simply not powerful enough to provide the same throughput of a parallel port, in serial bit banging fashion.

To come back to our Arty A7 that cannot operate at 333MHz. We can also solve this issue with a serial shift register, but we will need a shift register for every bit of info between the memory chip and the FPGA. This will become clear in a moment.

There is indeed blocks you can use as a shift register in the FPGA for this purpose, and they are called ISERDESE2 and OSERDESE2 blocks, which stands for Input Serializer/Deserializer and Output Serializer/Deserializer. Many of the pins of the FPGA have a OSERDESE2 and a ISERDESE2 in close proximity from it. The serial input or output of these blocks can happily operate at 333 MHz. It hands and receive the parallel at a much lower clock frequency, which the FPGA can handle.

Let us take the Address pins, A0-A11, as an example. Each of these pins will have a OSERDESE2 block associated with it, where the shift register will be 4 bits wide. So, we will receive 4 bits at a time and serialise it a bit at a time. We can therefore feed the parallel data at a fourth of the output frequency, which is 83.25MHz. The rest of the FPGA can happily operate the frequency.

The situation for the Data pins, DQ0-DQ15, is a bit more complex. With these pins we can both send and receive data from and to the FPGA, so we need both an ISERDESE2 and OSERDESE2 block for each pin.

Another complexity of the data pins is that they clock at double the data rate, e.g. receive data at both the rising edge and the falling edge. For these pins we can still supply data at 83MHz, but we need to supply 8 bits at a time.

Let us do some quick math. Usually with DDR3, we receive data in 8 consecutive bursts. On the Arty A7 the RAM databus is 16 bits wide. This means from the 83MHz side we need to deal with the OSERDESE2/ISERDESE2 blocks of the data pins, with words of 16*8 = 128 bits.

Playing with read levelling

Let us now play a bit with read levelling.

Read Levelling is eanbled by setting a bit in one of the registers in the DDR3 RAM. When this mode is enabled, any read command will yield a pattern of bits 01010101 on every DQ pin. So, on the Arty A7, where the RAM chip has a 16-bit databus, you will see the following data bursts, when doing a read in read levelling mode:

0000
FFFF
0000
FFFF
0000
FFFF
0000
FFFF

Indeed, putting the RAM chip in read levelling mode, is a quick way to check if we got our clock constraints correct with connectivity to the RAM.

One thing you might realise when capturing when capturing the test pattern, is that the alignment of the captured data is out by two or more bits. Let me explain why this can happen.

For any command we want to issue to the RAM chip, we need to supply 4 bits of data for every signal in the 83MHz clock domain, which will be clocked out by the OSERDESE2 in the 333MHz domain.

For a command, typically only 1 of these 4 bits will be the actual command. The rest of these bits will just be padded out either with ones or zeros.

For simplicity one might decide to pad out the first three bits and have the fourth bit as the command bit.

Now, as we all know, with the DDR family of chips, you don't get the requested data right away, but need to wait for a couple of clock cycles before you get the data.

For the RAM on the Arty A7 @333MHz, this waiting period is 5 clock cycles. This uneven number might cause that the burst of bits doesn't start on a 4 bit boundary.

To counter this, you would need to play with the position of the command bit within the four bits.

About the DQS signal during reads

During reads from DDR memory, we rely on the memory to provide us with a DQS signal, telling when we should strobe the data in the FPGA.

However, with my experimentation on the Arty A7, I found it basically impossible to constraint this signal coming from the outside, so that this signal can reliably clock an ISERDESE2.

Nevertheless, all hope is not gone. In the beginning of this post I mentioned that we can simplify a RAM controller on the Arty quite dramatically, because there is not really a need for doing levelling upon startup.

The read DQS signal is no exception to this. I found that we can use the same signal that is used for DQS write signal, to strobe the ISERDESE2 blocks for the data as well.

So, all in all I managed to capture data supplied by the RAM during read Levelling.

In Summary

In this post I talked a bit about read levelling. I also managed to enable read levelling mode on the Arty A7, and to read the data pattern provided by this mode successfully.

In the next post I will try and see if I can write some random data value to RAM, and read the same value back.

Till next time!

C64 on an FPGA

Thursday, 24 March 2022

Read Levelling on the Arty A7