Foreword
In the previous post we had a look at write levelling. Write levelling is required on DDR3 because control signals to different memory chips are daisy chained. This means that the control signals travel different distances to different memory chips, which at high frequencies can lead to significant clock skew.
In DDR3 we try to compensate for this clock skew, by delaying the data strobe signal for each byte lane by a specific amount during reading and writing.
The process of determining the amount of delay required for the strobe signals is called write levelling and read levelling.
In this post we will be covering read levelling, still basing our design heavily on the Elphel Memory controller of the 10393.
Simplifying the plot
The more I read about read and write-levelling, the more I am convinced that it is a bit of a overkill to run these processes every time when you power up an Arty A7.
As I mentioned previously, these levelling processes is to account for the difference in distances the control signals need to travel to different memory chips.
However, the Arty A7 contains just one memory chip, and not several. With just that assumption alone, one doesn't need to worry about clock skew among different memory chips.
One might also argue, even though we only have one memory chip, it can be potentially far away from the FPGA, also causing annoying clock skew, which we need to correct with the levelling processes.
Looking at the layout of the Arty A7, we see that this is again not the case:
I got this image from Digilent's web site. Item number 19 is the FPGA and item number 20 is the RAM chip. Measuring the distance between the two components is about 1 centimetre. This distance is almost nothing compared to the distance between CPU's and RAM on other motherboards.
Also, I am clocking RAM at 333MHz, so if you strobe the data at a quarter of a clock cycle after the data was asserted, this would be more than enough to counter any clock skew due to the distance between the FPGA and the RAM chip.
From all my measurements so far, I also found that I could reliably read data from the RAM chip when I use a quarter cycle delay before doing the strobe.
If this assumption remains valid throughout my project, this will simplify my design greatly, in the sense that I don't need to add logic for running levelling upon powerup.
Having said that, putting the RAM chips in write levelling or read levelling mode is still useful for doing research, like this post.
Using the ISERDESE2 and OSERDESE2 blocks
I mentioned in the previous section that on the Arty A7 that the RAM chips are clocked at 333MHz. However, the FPGA on the Arty A7 is not able function properly at such high frequencies, so how do we solve this?
Most of you will know that this kind of problem is not a new one in the Computer world. Just think of the reason why most Disk Drives produced for the Vic-20 and Commodore 64 were so slow, compared to Disk Drives for other computers of the era.
Starting with the Vic-20, they replaced the parallel Disk interface, which was used in the PET's, with a serial one, meaning that this new port had to be clocked theoretically at 8 times the clock speed to get the same performance as a Disk Drive of the Pet. They intended to use the serial shift register on the 6522 to reduce the load on the CPU.
We all now the rest of the story. They couldn't use the shift register of the 6522, because it had a bug, so they had to resolve to bit banging, which is the reason for the slow Disk load speeds. The CPU is simply not powerful enough to provide the same throughput of a parallel port, in serial bit banging fashion.
To come back to our Arty A7 that cannot operate at 333MHz. We can also solve this issue with a serial shift register, but we will need a shift register for every bit of info between the memory chip and the FPGA. This will become clear in a moment.
There is indeed blocks you can use as a shift register in the FPGA for this purpose, and they are called ISERDESE2 and OSERDESE2 blocks, which stands for Input Serializer/Deserializer and Output Serializer/Deserializer. Many of the pins of the FPGA have a OSERDESE2 and a ISERDESE2 in close proximity from it. The serial input or output of these blocks can happily operate at 333 MHz. It hands and receive the parallel at a much lower clock frequency, which the FPGA can handle.
Let us take the Address pins, A0-A11, as an example. Each of these pins will have a OSERDESE2 block associated with it, where the shift register will be 4 bits wide. So, we will receive 4 bits at a time and serialise it a bit at a time. We can therefore feed the parallel data at a fourth of the output frequency, which is 83.25MHz. The rest of the FPGA can happily operate the frequency.
The situation for the Data pins, DQ0-DQ15, is a bit more complex. With these pins we can both send and receive data from and to the FPGA, so we need both an ISERDESE2 and OSERDESE2 block for each pin.
Another complexity of the data pins is that they clock at double the data rate, e.g. receive data at both the rising edge and the falling edge. For these pins we can still supply data at 83MHz, but we need to supply 8 bits at a time.
Let us do some quick math. Usually with DDR3, we receive data in 8 consecutive bursts. On the Arty A7 the RAM databus is 16 bits wide. This means from the 83MHz side we need to deal with the OSERDESE2/ISERDESE2 blocks of the data pins, with words of 16*8 = 128 bits.
Playing with read levelling
Let us now play a bit with read levelling.
Read Levelling is eanbled by setting a bit in one of the registers in the DDR3 RAM. When this mode is enabled, any read command will yield a pattern of bits 01010101 on every DQ pin. So, on the Arty A7, where the RAM chip has a 16-bit databus, you will see the following data bursts, when doing a read in read levelling mode:
- 0000
- FFFF
- 0000
- FFFF
- 0000
- FFFF
- 0000
- FFFF