Sunday, 26 November 2017

Running on the fpga


In the previous post we managed to run the 6502 Test Suite developed by Klaus Dormann on Arlet Ottens's core.

In this post we will be modifying the top module so that it can run on run on the FPGA of the Zybo board.

The ultimate goal is to be able to run Klaus Dorman Dormann's Test Suite on our FPGA implementation. To accomplish this goal we need to do two things: Creating the FPGA implementation and writing a ARM Cortex program to control our core and to verify the results of the test suite.

In this post we will only tackle the first goal, creating the FPGA implementation.

The writing of the ARM Cortex program we will tackle in the next post.

More on ZYNQ development

As mentioned in my introduction post, the Zybo board is equipped with a Zynq FPGA chip.

Apart from an FPGA fabric, the Zynq also contains a ARM Cortex core with supporting peripherals, like USB, Ethernet and so on.

Most of your FPGA designs for the ZYNQ will co-operate with the ARM Cortex.

Interfacing your design with the ARM Cortex can get quite complex and we can easily get it wrong.

To assist in above mentioned complexity, Vivado provides a block design tool, which is a visual tool where you can drag and drop components and providing functionality for auto-connecting the components together.

The only caveat with this process is that if you want to add your own cores to the block design, you first need to wrap it into an IP for Vivado to use as a drop down component.

We will discuss how to wrap our core in a moment.

The intended design

Before going into details on how to wrap our core into an IP, I will first give some context on what we want to achieve.

The following diagram summarises more or less what we want to achieve:

The 6502 System block is basically the IP we are going to wrap. This block has 4 inputs and 1 output.

The clk_in input is fed by a clock generator which will clock at 8MHz allowing our core to execute at full speed.

The other 3 inputs and one output will be connected to the ARM Cortex via a GPIO (General purpose Input/Output) block.

It should be noted that we will also be writing a program that will run on the ARM Cortex. This program will control the pins linked to the GPIO block as shown in the diagram.

The idea is to run the Klaus Test Suite at full speed on the 6502 System block for about two minutes. This time interval will be more than enough to finish the Test Suite.

After the two minutes we want to drastically slow down the clock speed and monitor the addresses been placed on the address bus (via the addr_out pin) to get an idea at which point we are with the execution of the Test Suite.

The slow down of the clock speed is performed by pulling up the debug pin to high, which we will be doing via our ARM Cortex program.

With the debug_mode pin high, our 6502 System will not be using the clk_in pin anymore for clocking, but rather the debug_clk pin.

In our ARM Cortex program we will then be toggling the debug_pin, reading the addr_out after each toggle and outputting the value to the UART via a printf.

We can connect to the UART via terminal application and get an idea where we are in execution of the Test Suite.

This is in a nutshell what we want to achieve.

Wrapping our core into an IP

Let us start off by opening up the Vivado project we have created in the previous post.

Currently the sources panel in the IDE for this project should like follows:

As you can see there currently only defined simulation sources and no Design sources.

However, in order to wrap our core into an ip, we do need design sources. The quickest way to this would be to select all three simulation sources, right click them and select Move to design sources. This would however mean that our top module file would be shared among our simulation and design. This would require for us to add some directive within our top.v file so that it can be used for both simulation and design purposes.

For purposes of this post we will try to keep things simple, and just make a copy of top.v for synthesis purposes.

Click on the plus sign to add new sources. On the window that pops up ensure that Add or create design sources is selected and click next.

On the next screen click Create File. For the file name specify c64_core.v and click ok. For remaining popups, click Yes/ok till all is gone.

Now copy and paste the contents of top.v into c64_core.v and also change module name to c64_core.

Since we are creating an IP block, we need need to provide some parameters to our module. Modify your module header so that it looks as follows:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  output wire [15:0] addr_out

The purpose of these parameters we have discussed in a previous section.

We can now do a bit of cleanup in this module. For instance, we can remove the always block that generates the clk signal, since we will be getting it externally.

Similarly we can remove the initial block where we change the state of the reset pin.

At this point we might be tempted to also remove the initial block where we populate the contents of the RAM, since initial blocks in general doesn't synthesise to anything at all. This initial block is, however, an exception to the rule and many FPGA Syntheses tools, including Vivado, will synthesise a core which will also initialise applicable Block RAM with the contents of given HEX file.

I will proof above statement in a later section.

There is one remaining change we need to do with our top module code. As discussed earlier, our core will not always use clk_in for clocking. When the pin debug_mode is set to high, we need to use debug_clk as the system clock. To cater for this requirement we need to add the following assignment:

assign clk = debug_mode ? debug_clk : clk_in;

With this change it is also important to change the type of clk from a register to a wire.

A final thing we should do, is to make cpu.v and alu.v part of the design sources. Do this, select these two files in the Simulation sources section within the Sources panel. Right click the selection and then click Move to Design Sources.

At this point ensure everything is saved within the project.

We are now ready to do a synthesis run to see if there is any errors in our module. In the Project Manager panel, click Run Synthesis and follow the prompts.

Once we have verified that the synthesis is successful,  we can continue.

We will be referencing this project from a new project, so first close down this project and then create a new project.

With your new project also ensure that you select your ZYBO board as the target device.

Within your new project select Tools from the main menu and then Create and Package New IP. You will be presented with the following Wizard page:

Click Next.

On the next screen, ensure that Package a specific directory is selected and click next.

On the next screen, select the path to your project that you have closed a moment ago. Click next.

Give a sensible name for your IP project and click next again.

You will be finally brought to the final screen. Clicking Finish will create yet another new project for the specific use for defining additional properties for your IP.

The screen that opens up will looks something like the following:

On the left panel you will see Packaging Steps. Clicking on each of these steps will show a different set of information regarding your IP. For most of these steps, the default settings will do.

Click on the Review and Package step and then the Package IP button. The IP project will close down and you will be presented again your new empty project.

At this point your new IP is added to your new project and ready for use.

In the next section we will start our block design for the whole system.

Doing the Block Design

Let us do the block design.

Start by clicking on Create Block Design within the project manager panel. Leave defaults as is in the popup and click ok.

You will see an empty Design Panel opening containing the message: The design is empty. Press the + button to add IP.

Click the + button as hinted by the design panel. You you will be presented with a search dialogue.

The first IP we are going to add will be the ZYNQ processing system. So, within the search box enter zynq. The ZYNQ7 Processing System will be the first item item in the search results. Double click this item and you will see the block been added to your block design:

You will see a green line at the top providing you the option to Run Block Automation. This option basically means it will do some auto connections for you.

Please select this option. Click OK on the popup. You see a couple of detail has been added:

Next IP we are going to add, is the IP we have created in this post. Search with the name you have given this IP and add it.

Next it will be necessary to link up our core with ZYNQ processing system. As mentioned earlier we will be using a GPIO block for this. So , please add a AXI GPIO core to the design.

This time we will be provided a Connection Automation option. Once again we will be using this option. This time around we will be more picky with the options and the defaults will not do:

We will only be selecting S_AXI. The GPIO side will be connecting to our IP block, and we are not ready for any connections yet. The result of auto connections will look as follows:

As you can see a couple of new blocks have been added. However, the GPIO remains unconnected as well as our own core.

We basically want to connect four ports from our core to the GPIO port of the axi_gpio component. The Block design tool, however, only allows you to connect two ports to each other, and not a couple of ports to one port.

We will need to introduce core that will aggregate our ports into one port, which we will in turn connect to the GPIO port.

Here is the verilog code for the Aggregator:

module gpio_manipulator (
  input wire [19:0] gpio_output,
  output wire [19:0] gpio_input,
  output wire clk_gen_rst,
  output wire rst,
  output wire debug_mode,
  output wire debug_clk,
  input wire [15:0] address

assign clk_gen_rst = gpio_output[16];
assign rst = gpio_output[17];
assign debug_mode = gpio_output[18];
assign debug_clk = gpio_output[19];
assign gpio_input[15:0]= address; 


This module you should wrap as an IP. Once wrapped, we should also add it to our block design.

You will notice that our gpio_manipulator only makes provision for 20 gpio pins, whereas an AXI-GPIO has 32 pins by default. It is important that we also reduce the input/output pins on the axi component to 20.

We do this reduction by first double clicking on the AXI GPIO component. When the screen opens, go to the IP Configuration tab and change GPIO width from 32 to 20.

We can now start to connect our components to the rest of the system.

At this stage you might be puzzled as the gpio_manipulator has a gpio input and a gpio port, whereas the axi gpio component only has one gpio port. How do we connect our two gpio ports to the one port?

The problem can be solved by just clicking on the + next to GPIO on the AXI GPIO component. You will see more ports become available.

gpio_io_i and gpio_io_o will look familiar, but not gpio_io_t. gpio_io_t defines for each of the 20 pins whether it is an input or an output. For now we will not worry about gpio_io_t because we are going to directly connect to gpio_io_i and gpio_io_o.

We can now continue and connect some wires.

Move the mouse cursor to the gpio_io_i pin of the axi_gpio block. You will see the mouse cursor will change into a pencil. While keeping the left mouse button depressed, move the mouse to the gpio_input pin of the gpio_manipulator block. You will see that a couple of pins that can potentially be connected will light up with a green tick, as shown below:

While the mouse cursor is over gpio_input of gpio_manipulator, release the left button of the mouse.

The connection you have just made, will look as follows:

Now, make a similar connection from the pin gpio_io_o of the axi gpio component to the gpio_output pin of gpio manipulator.

This concludes the connections between the gpio manipulator and the axi gpio.

The rest of the pins of the gpio manipulator we can now connect to our other custom core. The result will look as follows:

At this point all the pins of c64_core is connect except for clk_in. A good connection candidate for this pin is FCLK_CLK0 of the the Processing system block. The clock frequency of this pin is however 100MHz.

Well, I am sure that not many people would mind a C64 clocking at 100MHz 😃 However, such a C64 would be too fast too use, so let us try and keep to the real speed as much as possible.

We will use a clock generator to bring the clock speed down. Add a new IP and within the search box type clock. In the search results click on Clocking Wizard. The resulting block been added to the design will look as follows:

clk_out1 will be the pin to connect to clk_in of our c64 core. Obviously clk_in1 we will connect to FCLK_CLK0 which is 100MHz.

The reset pin we will connect to clk_gen_rst of our gpio manipulator. The reset of our clock generator will therefore also form part of the responsibility of our ARM Cortex program.

One final thing we should is to configure the parameters of our clock generator. Double on the clock component and go to clocking options. Scroll right to the bottom and ensure that the input clock frequency is at Auto, which will yield 100MHz:

Next, go to the Output clocks tab. In this tab we want to push down the clock frequency to 1Mhz, close to the speed of a real C64. However, the wizard wont allow us to go below 4.6MHz. Som for now we will leave it a 5Mhz. We will work around this in future posts.

We are now done with the block design. Let us see if it can synthesise. In the Sources panel, right click on design and select create HDL wrapper:

With the HDL wrapper created we can now synthesise. On the left click Run Synthesis. Verify that there are no errors.

Synthesizable Block RAM Initialisation

In a previous section I mentioned that an initial block with a $readmem will indeed synthesise into block RAM elements with there content initialised by a given hex-file.

In this section I will prove this statement with the synthesised design that we ended with in the previous section.

Assuming the final synthesis in the previous section was successful, click on Open Synthesised design in the left panel. The resulting diagram will look something like the following:

We will now keep drilling down the design till we reach the block RAMS containing the Test Suite code.

We drill down by clicking on the + on the top left of the block. Drilling into this block will yield quite a sea of wires as shown below:

Every grey block corresponds more or less to a block you have added to the block design. If you zoom into the top three blocks you will see that they are related to the ps7 processing system.

The blocks we are really interested in is the four blocks bottom right. So let us zoom into this region:

It is not so clear in the picture, but the block second from the left is c64_core. This is the block we are after, so let us drill into it. After drilling twice into this block, we are actually getting to the block RAM (I have marked them in red):

In this particular scenario the block RAMS is arranges into eight rows containing 2 Block RAMS each. As you might have guessed, each row corresponds to one bit, and the eight rows together forms an addressable byte.

The two block rams per row, is a configuration called the cascade configuration. Each block RAM can store 32Kbit information, so the left block RAM contains data for addresses for the first 32KB of memory and the one the right contains the data for the the last 32KB of memory.

The schematic view also allows us the view the init values that will be assigned to a block RAM. To view, click on a block RAM and then within the cell properties panel click on Properties. If you now scroll down, you will eventually see lots of hex values:

You will see each row of Hex number is preceeded by 256'h which is typical Verilog syntax meaning the row contains 256 bits in total. It should be noted though that the order of the bits in each row is reversed, meaning the last bits is shown first and we go down to the first bit.

Let us see if we can match up some of the data shown in this view to the actual binary data in the Klaus Test Suite binary.

The values that we will try to match will be the last four bytes of memory (e.g. the reset and IRQ vector).

To retrieve the last four bytes of memory we need to look at the second block RAM in each row. For each of the these Block RAMS we need to get the contents of INIT_7F (7F is the last init row for each Block RAM, since each BLOCK RAM has 80Hex rows, numbered from INIT_00 to INIT_7F).

Here is the data in question, from top to down:

Since we want the last four bytes, we take the first hex digit of each line:


We now convert each of these values to binary:

4 = 0100
C = 1100
A = 1010
4 = 0100
8 = 1000
8 = 1000
4 = 0100
4 = 0100

Now, each column of binary digits forms a byte (NB!! Most significant bit at the bottom). Since the last byte is first, working its way to the first bit. The mapping is thus:

  • Column 1 = address $FFFF, value = 00110110b = 3B
  • Column 2 = address $FFFE, value = 11001011b = CB
  • Column 3 = address $FFFD, value = 00000100b = 04
  • Column 4 = address $FFFC, value = 00000000b = 00
These values match up to the values we see when opening up the Klaus Test Suite Binary with a Hex editor.

In Summary

In this post we developed an FPGA core that can execute the 6502 Test Suite developed by Klaus Dormann.

With this done, however, we are only halfway there.

What still needs to be done is to write a ARM Cortex program that will reset our FPGA core and monitor the execution of the Test Suite.

Till next time!

No comments:

Post a Comment