Foreword
In the previous post I gave a quick run down of initials perceptions of the Arty A7 board in getting memory access to work.
The Memory Interface Generator (MIG) provided by Vivado is very easy to use to create blocks for interfacing with the memory on the Arty board. However, no matter how much I tweaked the MIG generated code, I couldn't get to a design with minimal latency.
I started looking into alternative opensource Memory Controller designs, with the hope of getting around this latency issue.
One of the opensource designs I have looked into in the previous post, was a ddr3 design by Ultraembedded on Github. This is a very simplistic design, but also have latency issues.
In this post I will look into another Opensource memory controller, which is provided on Github by the Elphel camera project.
The Elphel Camera project is quite a project of note, of which the DDR3 memory controller forms just one component of it. So, in order to understand the surrounding context of their memory controller, I will start at a very high level, explaining what the Elphel camera project is about, and then gradually zoom into the memory controller itself.
About Elphel Inc.
Elphel Inc. is a company that produces opensource digital cameras. It uses lenses and Image sensors from other manufacturers. These cameras processes the images from the image sensors and produces JPGs using an FPGA and a CPU.
One of the notable customers for Elphel cameras was Google Street View. I am sure many people have seen a vehicle like this in their neighbourhood the past decade or so, which was used to capture the footage for Google Street View:
The spherical ball on top contains a couple of cameras pointing in different directions. Many of these balls contained Elphel cameras in the past. The interesting thing is that the hosted FPGA/CPU within the Elphel setup can handle a couple of image sensors simultaneously, where each sensor can be 5MP or higher. These cameras can capture a number of frames a second, convert it to JPG's and store it to a SATA hard drive.
Through the years Elphel produced different camera models, with their most recent models using a SoC that contains both a CPU and an FPGA: The Xilinx Zynq 7030 SoC. This is the big brother of the Zynq 7010, which is used in the Zybo board.
Let us have a quick look at the block diagram for the 10393:
You will note that both the FPGA and the ARM core has its own RAM. As you have seen in my C64 series on the Zybo, you have seen that it is possible for the FPGA to also access the RAM used by the ARM CPU. However, in the case of the 10393 the memory access patterns is quite unique, and for that reason they have decided to give the FPGA access to its own RAM.
On Elphel's web site there is a very interesting blog post in all the thoughts that went into designing the FPGA memory controller:
https://www.elphel.com/www3/node/355The nice thing here is that the FPGA embedded in the Zynq 7030, and the FPGA on the Arty A7 is from the same FPGA family, so it should be easy to use the 10393 memory controller on our Arty A7 board. However, as you will see later there is some minor tweaks we need to do to make it work on the Arty.
Focusing on the 10393 memory controller
Let us now focus on the 10393 memory controller. First of all, here is the link to the source code for the 10393:
https://github.com/Elphel/x393
The following diagram they provide gives a quick overview on how the components fit together:
Shared External Memory is basically the memory controller, providing various channels for peripherals that need access to memory. The image sensors use channels 8, 9, 10 and 11 for writing image data to memory.
The blocks creating JPEGs for every sensor, uses channel 12, 13, 14 and 15 for reading from memory. Something interesting about these blocks is that they write the resulting JPEGs to the ports SAXIHP1 and SAXIHP2. These ports will ensure that this information ends up in the RAM to which the ARM CPU has access to, so that the JPEGs can be written to an SATA hard drive.
Looking at the code, it is quite difficult to figure out at first glance how the memory controller works. However, I have found it helps if you initially only focus on these files:
- memctrl/mcntrl393.v
- memctrl/memctrl16.v
- memctrl/phy/mcontr_sequencer.v
- memctrl/phy/phy_cmd.v
- memctrl/phy/phy_top.v
Here mcntrl393.v is the top level module of the memory controller and the rest of the list is order by module level, so phy_top.v is the lowest level module.
Even with narrowing down to a handful of files, there is still a lot going on๐. So, I will try and not be pedantic and just focus on what is required.
More on the DDR3 protocol
Before we get to see how we are going get the 10393 memory controller working on a Arty A7, let us familiarise ourselves a bit with the DDR3 protocol. This background information will help when we need to troubleshoot later on.
On a DDR3 RAM chip, you have the following signals:
- Address
- Cas - Column select
- Ras - Row select
- We - write enable
- DQ - data
- DQS - data strobe
- Command clock
There is other signals on DDR3 chips as well, but for simplicity of this discussion, I am only showing these.
For anyone that is familiar with the RAM of a Retro gaming system, like the C64, many of the above signals will look familiar. For instance, on the C64 we know that in order to read from memory we first need to provide a row address, with the Ras signal asserted, and then we need to provide a column address with the Cas line asserted. After some time, while the CAS line is asserted, the data will be available on the data lines.
As you can see from my description above, there is not really an exact times defined for when each of these steps should be performed. Luckily the manufactures of these RAM did provide for quite some headroom for timing errors.
However, throughout the years manufacturers tried pushing the data rate of RAM chips higher and higher, causing the headroom for timings errors to shrank drastically. To get around this increasing tight fit, it was necessary to introduce some clocking for both the commands and the data. For this reason there is a command clock and a DQS (Data Strobe) signal.
Another limitation when increasing the data rate is impedance and trace length, especially when you have multiple memory modules. The memory modules the furthest away from the memory controller will receive the controller signals (e.g. Cas, Ras, We, Command clock) later than the closer ones, causing clock skew.
In DDR3 we have the capability to compensate for these clock skew by delaying the DQS signal for each byte lane, so that it matches the Command Clock. The process of determining the amount of compensation required is called Write levelling and read levelling.
Let us have a quick look at how Write levelling work. We will look at read levelling in a later post.
With write levelling the DDR3 memory chip needs to placed first in write levelling mode. This is done by setting a register within the DDR3 memory chip.
During this mode the memory controller needs to toggle the DQS signal. However, during this mode the DDR3 memory chip uses the DQS signal in a totally different way than in normal mode of operation. In this mode, during the rising edge of the DQS signal, the memory chip samples the command clock and out the result on the lowest bit of the DQ bus. The purpose is that a '1' is eventually outputted on the DQ bus. If this is not the case, the DQS signal needs to be increasingly delayed in small steps, until a '1' is output to the DQ bus.
In Summary
In this post we have started to look into the memory controller provided by the Elphel Camera project especially from their 10393 camera model.
We also looked very briefly into the DDR3 protocol.
In the next post we will start to try and get this memory controller to work on the Arty A7 board.
As a first step, we will try and get write levelling to work, adjusting the DQS signal until we get a '1' on the DQ bus.
Till next time!
No comments:
Post a Comment