Friday, 1 December 2017

Programming the ARM Cortex


In the previous post we developed the FPGA implementation for running the 6502 Test Suite written by Klaus Dormann on the Zybo board.

In this post we will be writing an ARM Cortex program for controlling our FPGA implementation, that is starting it up and monitoring the status of the Testsuite execution.

Opening the Xilinx SDK

We will be developing our ARM Cortex program within the Xilinx SDK.

The Xilinx SDK gets installed as part of the Vivado installation process.

The Xilinx SDK can be launched from Vivado, but before we do, there is a couple of steps we need to do beforehand.

As you remember we ended off running the Synthesis on our FPGA implementation and verified that there was no errors.

The next step we need to do is generate a bistream. This done by clicking on Generate Bitstream in the left Panel. Follow the prompts and wait for the process to complete.

We can now start preparing for the launch of the Xilinx SDK.

First export the hardware by selecting File/Export/Export Hardware:

On the resulting screen ensure that the Include Bitstream option is selected and Click OK:

We are now ready to launch the Xilinx SDK. Under File select Launch SDK. On the resulting dialogue click OK.

Xilinx SDK will now start up:

Xilinx SDK is based on Eclipse, so similar concepts apply, like you can have a couple of projects within the same Workspace.

As you can see, our Workspace already has one project called design_1_wrapper_hw_platform_0. This project contains some code for initialising our hardware platform at startup.

Our application will be contained in another project, same workspace. So select File/New/Application Project.

Give a meaningful name for your project and click Next. On the next page we need to select a Default template for our new project. The Hello World Template, selected by default, will do. Click finish.

You will see two new nodes created in the Project Explorer Panel:

The first Node, Test_Suite_Run, is your new project.

The folder ending with _bsp is a Board Support Package. This folder contains the necessary include files and libraries that your program will need to get to the hardware specific stuff of the core you are using.

The helloworld program itself is within Test_Suite_Run within the src folder called helloworld.c. This is the file we will use to add our code for controlling our custom core.

Getting all the info together

Let us now get all the information together that is needed to write our ARM Cortex program.

As you know we will be communicating with our core via a GPIO Block. Important pieces of information we need here is the pin assignments. We can get these information by looking at the gpio_manipulator.v. In summary here is the required information:

  • GPIO Inputs (Bits 15:0): Address input
  • GPIO Output (Bit 16): clk_gen_reset
  • GPIO Output (Bit 17): rst
  • GPIO Output (Bit 18): debug_mode
  • GPIO Output (Bit 19): debug_clk
The next piece of information we need is: How do we communicate with the GPIO from our ARM Cortex program?

As many other peripherals in a ARM system the GPIO is a set of registers mapped within the memory space. So the firstly we need to know the memory address of our GPIO peripheral.

We get this info by opening our Block Design in Vivado. You will see next to the Design tab is a Address Editor tab:

Click on the Address Editor tab and you will see the required info:

As you see our gpio block is mapped to address 0x4120_0000 in memory space.

At this point you may be wondering how to use these registers. Xilinx provide this information in a product Guide that is s publicly available on there web site. To get to this guide, do an Internet search with the search terms Product Guide Xilinx GPIO. One of the first hits will be something like AXI GPIO v2.0 LogiCORE IP Product Guide (PG144). This is the guide we are after. Open it and scroll down to the following section:

Our GPIO instance only has a single channel, so only address offset 0x0 and 0x4 is applicable. In our design we didn't connect up the tristate register, so this leaves us only with register 0x0 that we need to use.

The access type column indicates that register 0x0 accepts reads and writes. So, using the pin assignments from the previous section, we only need to read/write to the applicable bit in register to have the desired effect.

Writing our ARM Cortex program

We finally have enough information to start writing out ARM Cortex program.

I will start outlining what we want to achieve in pseudo code:

  1. Initialise both reset pins (e.g. rst and clk_gen_rst) as asserted
  2. Wait one second
  3. Pull clk_gen_rst pin down
  4. Wait one second
  5. Pull rst pin low
  6. Wait two minutes
  7. Assert debug_mode pin
  8. Repeat 20 times 
    1. Toggle debug_clk
    2. Read address outpins pins
    3. Output to value of address pins to UART
Just a quick explanation of the pseudo code.

In step 3 with the clk_gen_rst pin pulled down, the clock generator will start oscillating. It will however take a small time period for the clock generator to reach a stable state. Strictly speaking we should look at the lock of the clock generator to know when it is in a stable state.

To keep things simple we haven't connected the lock pin. Instead, we will just wait a second which is more than enough time for our clock generator to reach a stable state.

Once our clock generator is in an assumed stable state, we can pull the reset pin of our custom core low. This will initiate the execution of the test suite.

We then wait two minutes, which should be more than enough time for our core to finish the Test Suite.

After two minutes we assert the debug_mode pin. This will shift the clock source used by our core from the clock generator to debug_clk, which we will manually clock in our code.

In step eight we enter a short loop, where we toggle the clock, read the address output of our core and outputting it to the UART.

Next we will implement this algorithm. Open up helloworld.c and modify it so that it looks like follows:

#include <stdio.h>
#include "platform.h"
#include "xil_printf.h"
#include "xil_io.h"
#include <unistd.h>

int main()

    print("Hello World\n\r");

    u32 regval = (1 << 16) | (1 << 17);
    Xil_Out32(0x41200000, regval);
    regval = ~(1 << 16) & regval;
    Xil_Out32(0x41200000, regval);
    regval = ~(1 << 17) & regval;
    Xil_Out32(0x41200000, regval);
    regval = regval | (1 << 18);
    Xil_Out32(0x41200000, regval);
    for (int i = 0; i < 20; i++) {
     u32 in = Xil_In32(0x41200000);
     in = in & 0xffff;
     printf("in %x\n\r",in);
     regval = regval | 19;
     Xil_Out32(0x41200000, regval);
     regval = ~(1 << 19) & regval;
     Xil_Out32(0x41200000, regval);
    return 0;

We include two additional headers:

  • unistd.h: Header file containing usleep (microsleep). 
  • xil_io.h: Header file containing functions for reading and writing to GPIO.
As you can see, we use Xil_Out32 to write data to GPIO and Xil_In32 to read data from GPIO.

We are now ready to run our program on the ZYBO board. Ensure that the ZYBO board is plugged into your PC via the USB port and switch it on.

Next we should program the FPGA with our implementation. Do this by clicking on Program FPGA:

With the FPGA programmed, click on the Debug button and select Debug/Launch on Hardware(System Debugger):

After a couple of seconds, you will see the first line within your main method gets hit as a breakpoint:

At this point we need to start a terminal session with the UART on the ZYBO. Do this by issuing the following command:

screen /dev/ttyUSB1 115200

Now let the program run to completion. This will take about two minutes. The terminal output will look more ore less like the following:

In this instance our core reached the loop at address 339a. If you have a look at the source code for Klaus Dormann's Test Suite, you will see that the Test Suite was successful if this loop was reached.

So, we know we have done the FPGA implementation correctly and the Arlet core is correct.

In Summary

In this post we wrote the ARM Cortex program for controlling our core and monitoring the execution of the Test Suite.

We confirmed that our implementation was correct.

In the next post we will try to boot our FPGA implementation with the C64 ROMS.

Till next time!


  1. Hmmm... I reached "in 100" =)
    Time for some more debugging.

    1. Better, I fixed an issue I had with my clock wizard rst that I made active low last blog in order to clear an error, that was the wrong approach. Vivado 2019 makes the clk_gen_rst pin of the gpio_manipulator active low in the block diagram which trips an error when connected to the clock wizard default active high rst. The solution is to double click on the clk_gen_rst pin in the block diagram and force the polarity to active high so that it avoids the error. I like how vivado just assumes that if it says rst in the name, it must be active low and gives you an error, not a warning....

      Now my result is:
      "in 346a"

      I need to figure out what that is in the test code an revisit my block ram initialization hex file to make sure this all makes sense and agrees.

      I hope you don't mind these comments I'm posting. I just wanted to document the little issues I come across as I work with the newer versions of the tool and retrace your steps.


  2. It looks like the Klaus Dorman's Functional Test Suite changed from the original writing of this post. To see the relationship between addresses and the assembly for the test there is a list file in a folder named bin_files on the klaus github repo
    you can download the lst file and change the extension to htm and open in a browser. It looks like the new success end loop for the test is as follows:

    ; S U C C E S S ************************************************
    ; -------------
    success ;if you get here everything went well
    3469 : 4c6934 > jmp * ;test passed, no errors

    ; -------------
    ; S U C C E S S ************************************************
    346c : 4c0004 jmp start ;run again

    I reached 346c which seems like the right place to be. However at the time of the blog writing, Johan Steenkamp used the older 2013 Klaus test list which shows as follows:

    ; S U C C E S S ************************************************
    ; -------------
    success ;if you get here everything went well
    3399 : 4c9933 > jmp * ;test passed, no errors

    ; -------------
    ; S U C C E S S ************************************************
    339c : 4c0004 jmp start ;run again

    Johan reached 339A which is the correct place to be for his version of the test.