Sunday 26 November 2017

Running on the fpga

Foreword

In the previous post we managed to run the 6502 Test Suite developed by Klaus Dormann on Arlet Ottens's core.

In this post we will be modifying the top module so that it can run on run on the FPGA of the Zybo board.

The ultimate goal is to be able to run Klaus Dorman Dormann's Test Suite on our FPGA implementation. To accomplish this goal we need to do two things: Creating the FPGA implementation and writing a ARM Cortex program to control our core and to verify the results of the test suite.

In this post we will only tackle the first goal, creating the FPGA implementation.

The writing of the ARM Cortex program we will tackle in the next post.

More on ZYNQ development

As mentioned in my introduction post, the Zybo board is equipped with a Zynq FPGA chip.

Apart from an FPGA fabric, the Zynq also contains a ARM Cortex core with supporting peripherals, like USB, Ethernet and so on.

Most of your FPGA designs for the ZYNQ will co-operate with the ARM Cortex.

Interfacing your design with the ARM Cortex can get quite complex and we can easily get it wrong.

To assist in above mentioned complexity, Vivado provides a block design tool, which is a visual tool where you can drag and drop components and providing functionality for auto-connecting the components together.

The only caveat with this process is that if you want to add your own cores to the block design, you first need to wrap it into an IP for Vivado to use as a drop down component.

We will discuss how to wrap our core in a moment.

The intended design

Before going into details on how to wrap our core into an IP, I will first give some context on what we want to achieve.

The following diagram summarises more or less what we want to achieve:


The 6502 System block is basically the IP we are going to wrap. This block has 4 inputs and 1 output.

The clk_in input is fed by a clock generator which will clock at 8MHz allowing our core to execute at full speed.

The other 3 inputs and one output will be connected to the ARM Cortex via a GPIO (General purpose Input/Output) block.

It should be noted that we will also be writing a program that will run on the ARM Cortex. This program will control the pins linked to the GPIO block as shown in the diagram.

The idea is to run the Klaus Test Suite at full speed on the 6502 System block for about two minutes. This time interval will be more than enough to finish the Test Suite.

After the two minutes we want to drastically slow down the clock speed and monitor the addresses been placed on the address bus (via the addr_out pin) to get an idea at which point we are with the execution of the Test Suite.

The slow down of the clock speed is performed by pulling up the debug pin to high, which we will be doing via our ARM Cortex program.

With the debug_mode pin high, our 6502 System will not be using the clk_in pin anymore for clocking, but rather the debug_clk pin.

In our ARM Cortex program we will then be toggling the debug_pin, reading the addr_out after each toggle and outputting the value to the UART via a printf.

We can connect to the UART via terminal application and get an idea where we are in execution of the Test Suite.

This is in a nutshell what we want to achieve.

Wrapping our core into an IP

Let us start off by opening up the Vivado project we have created in the previous post.

Currently the sources panel in the IDE for this project should like follows:


As you can see there currently only defined simulation sources and no Design sources.

However, in order to wrap our core into an ip, we do need design sources. The quickest way to this would be to select all three simulation sources, right click them and select Move to design sources. This would however mean that our top module file would be shared among our simulation and design. This would require for us to add some directive within our top.v file so that it can be used for both simulation and design purposes.

For purposes of this post we will try to keep things simple, and just make a copy of top.v for synthesis purposes.

Click on the plus sign to add new sources. On the window that pops up ensure that Add or create design sources is selected and click next.

On the next screen click Create File. For the file name specify c64_core.v and click ok. For remaining popups, click Yes/ok till all is gone.

Now copy and paste the contents of top.v into c64_core.v and also change module name to c64_core.

Since we are creating an IP block, we need need to provide some parameters to our module. Modify your module header so that it looks as follows:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  output wire [15:0] addr_out
    );


The purpose of these parameters we have discussed in a previous section.

We can now do a bit of cleanup in this module. For instance, we can remove the always block that generates the clk signal, since we will be getting it externally.

Similarly we can remove the initial block where we change the state of the reset pin.

At this point we might be tempted to also remove the initial block where we populate the contents of the RAM, since initial blocks in general doesn't synthesise to anything at all. This initial block is, however, an exception to the rule and many FPGA Syntheses tools, including Vivado, will synthesise a core which will also initialise applicable Block RAM with the contents of given HEX file.

I will proof above statement in a later section.

There is one remaining change we need to do with our top module code. As discussed earlier, our core will not always use clk_in for clocking. When the pin debug_mode is set to high, we need to use debug_clk as the system clock. To cater for this requirement we need to add the following assignment:

assign clk = debug_mode ? debug_clk : clk_in;

With this change it is also important to change the type of clk from a register to a wire.

A final thing we should do, is to make cpu.v and alu.v part of the design sources. Do this, select these two files in the Simulation sources section within the Sources panel. Right click the selection and then click Move to Design Sources.

At this point ensure everything is saved within the project.

We are now ready to do a synthesis run to see if there is any errors in our module. In the Project Manager panel, click Run Synthesis and follow the prompts.

Once we have verified that the synthesis is successful,  we can continue.

We will be referencing this project from a new project, so first close down this project and then create a new project.

With your new project also ensure that you select your ZYBO board as the target device.

Within your new project select Tools from the main menu and then Create and Package New IP. You will be presented with the following Wizard page:

Click Next.

On the next screen, ensure that Package a specific directory is selected and click next.

On the next screen, select the path to your project that you have closed a moment ago. Click next.

Give a sensible name for your IP project and click next again.

You will be finally brought to the final screen. Clicking Finish will create yet another new project for the specific use for defining additional properties for your IP.

The screen that opens up will looks something like the following:


On the left panel you will see Packaging Steps. Clicking on each of these steps will show a different set of information regarding your IP. For most of these steps, the default settings will do.

Click on the Review and Package step and then the Package IP button. The IP project will close down and you will be presented again your new empty project.

At this point your new IP is added to your new project and ready for use.

In the next section we will start our block design for the whole system.

Doing the Block Design

Let us do the block design.

Start by clicking on Create Block Design within the project manager panel. Leave defaults as is in the popup and click ok.

You will see an empty Design Panel opening containing the message: The design is empty. Press the + button to add IP.

Click the + button as hinted by the design panel. You you will be presented with a search dialogue.

The first IP we are going to add will be the ZYNQ processing system. So, within the search box enter zynq. The ZYNQ7 Processing System will be the first item item in the search results. Double click this item and you will see the block been added to your block design:

You will see a green line at the top providing you the option to Run Block Automation. This option basically means it will do some auto connections for you.

Please select this option. Click OK on the popup. You see a couple of detail has been added:


Next IP we are going to add, is the IP we have created in this post. Search with the name you have given this IP and add it.

Next it will be necessary to link up our core with ZYNQ processing system. As mentioned earlier we will be using a GPIO block for this. So , please add a AXI GPIO core to the design.

This time we will be provided a Connection Automation option. Once again we will be using this option. This time around we will be more picky with the options and the defaults will not do:



We will only be selecting S_AXI. The GPIO side will be connecting to our IP block, and we are not ready for any connections yet. The result of auto connections will look as follows:


As you can see a couple of new blocks have been added. However, the GPIO remains unconnected as well as our own core.

We basically want to connect four ports from our core to the GPIO port of the axi_gpio component. The Block design tool, however, only allows you to connect two ports to each other, and not a couple of ports to one port.

We will need to introduce core that will aggregate our ports into one port, which we will in turn connect to the GPIO port.

Here is the verilog code for the Aggregator:

module gpio_manipulator (
  input wire [19:0] gpio_output,
  output wire [19:0] gpio_input,
  output wire clk_gen_rst,
  output wire rst,
  output wire debug_mode,
  output wire debug_clk,
  input wire [15:0] address
); 

assign clk_gen_rst = gpio_output[16];
assign rst = gpio_output[17];
assign debug_mode = gpio_output[18];
assign debug_clk = gpio_output[19];
assign gpio_input[15:0]= address; 

endmodule


This module you should wrap as an IP. Once wrapped, we should also add it to our block design.

You will notice that our gpio_manipulator only makes provision for 20 gpio pins, whereas an AXI-GPIO has 32 pins by default. It is important that we also reduce the input/output pins on the axi component to 20.

We do this reduction by first double clicking on the AXI GPIO component. When the screen opens, go to the IP Configuration tab and change GPIO width from 32 to 20.

We can now start to connect our components to the rest of the system.

At this stage you might be puzzled as the gpio_manipulator has a gpio input and a gpio port, whereas the axi gpio component only has one gpio port. How do we connect our two gpio ports to the one port?

The problem can be solved by just clicking on the + next to GPIO on the AXI GPIO component. You will see more ports become available.


gpio_io_i and gpio_io_o will look familiar, but not gpio_io_t. gpio_io_t defines for each of the 20 pins whether it is an input or an output. For now we will not worry about gpio_io_t because we are going to directly connect to gpio_io_i and gpio_io_o.

We can now continue and connect some wires.

Move the mouse cursor to the gpio_io_i pin of the axi_gpio block. You will see the mouse cursor will change into a pencil. While keeping the left mouse button depressed, move the mouse to the gpio_input pin of the gpio_manipulator block. You will see that a couple of pins that can potentially be connected will light up with a green tick, as shown below:


While the mouse cursor is over gpio_input of gpio_manipulator, release the left button of the mouse.

The connection you have just made, will look as follows:

Now, make a similar connection from the pin gpio_io_o of the axi gpio component to the gpio_output pin of gpio manipulator.

This concludes the connections between the gpio manipulator and the axi gpio.

The rest of the pins of the gpio manipulator we can now connect to our other custom core. The result will look as follows:


At this point all the pins of c64_core is connect except for clk_in. A good connection candidate for this pin is FCLK_CLK0 of the the Processing system block. The clock frequency of this pin is however 100MHz.

Well, I am sure that not many people would mind a C64 clocking at 100MHz 😃 However, such a C64 would be too fast too use, so let us try and keep to the real speed as much as possible.

We will use a clock generator to bring the clock speed down. Add a new IP and within the search box type clock. In the search results click on Clocking Wizard. The resulting block been added to the design will look as follows:

clk_out1 will be the pin to connect to clk_in of our c64 core. Obviously clk_in1 we will connect to FCLK_CLK0 which is 100MHz.

The reset pin we will connect to clk_gen_rst of our gpio manipulator. The reset of our clock generator will therefore also form part of the responsibility of our ARM Cortex program.

One final thing we should is to configure the parameters of our clock generator. Double on the clock component and go to clocking options. Scroll right to the bottom and ensure that the input clock frequency is at Auto, which will yield 100MHz:


Next, go to the Output clocks tab. In this tab we want to push down the clock frequency to 1Mhz, close to the speed of a real C64. However, the wizard wont allow us to go below 4.6MHz. Som for now we will leave it a 5Mhz. We will work around this in future posts.


We are now done with the block design. Let us see if it can synthesise. In the Sources panel, right click on design and select create HDL wrapper:


With the HDL wrapper created we can now synthesise. On the left click Run Synthesis. Verify that there are no errors.

Synthesizable Block RAM Initialisation

In a previous section I mentioned that an initial block with a $readmem will indeed synthesise into block RAM elements with there content initialised by a given hex-file.

In this section I will prove this statement with the synthesised design that we ended with in the previous section.

Assuming the final synthesis in the previous section was successful, click on Open Synthesised design in the left panel. The resulting diagram will look something like the following:


We will now keep drilling down the design till we reach the block RAMS containing the Test Suite code.

We drill down by clicking on the + on the top left of the block. Drilling into this block will yield quite a sea of wires as shown below:


Every grey block corresponds more or less to a block you have added to the block design. If you zoom into the top three blocks you will see that they are related to the ps7 processing system.

The blocks we are really interested in is the four blocks bottom right. So let us zoom into this region:


It is not so clear in the picture, but the block second from the left is c64_core. This is the block we are after, so let us drill into it. After drilling twice into this block, we are actually getting to the block RAM (I have marked them in red):


In this particular scenario the block RAMS is arranges into eight rows containing 2 Block RAMS each. As you might have guessed, each row corresponds to one bit, and the eight rows together forms an addressable byte.

The two block rams per row, is a configuration called the cascade configuration. Each block RAM can store 32Kbit information, so the left block RAM contains data for addresses for the first 32KB of memory and the one the right contains the data for the the last 32KB of memory.

The schematic view also allows us the view the init values that will be assigned to a block RAM. To view, click on a block RAM and then within the cell properties panel click on Properties. If you now scroll down, you will eventually see lots of hex values:


You will see each row of Hex number is preceeded by 256'h which is typical Verilog syntax meaning the row contains 256 bits in total. It should be noted though that the order of the bits in each row is reversed, meaning the last bits is shown first and we go down to the first bit.

Let us see if we can match up some of the data shown in this view to the actual binary data in the Klaus Test Suite binary.

The values that we will try to match will be the last four bytes of memory (e.g. the reset and IRQ vector).

To retrieve the last four bytes of memory we need to look at the second block RAM in each row. For each of the these Block RAMS we need to get the contents of INIT_7F (7F is the last init row for each Block RAM, since each BLOCK RAM has 80Hex rows, numbered from INIT_00 to INIT_7F).

Here is the data in question, from top to down:


  • 256'h47FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'hCFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'hABFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'h43FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'h8BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'h8BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'h47FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • 256'h47FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Since we want the last four bytes, we take the first hex digit of each line:

4
C
A
4
8
8
4
4

We now convert each of these values to binary:

4 = 0100
C = 1100
A = 1010
4 = 0100
8 = 1000
8 = 1000
4 = 0100
4 = 0100

Now, each column of binary digits forms a byte (NB!! Most significant bit at the bottom). Since the last byte is first, working its way to the first bit. The mapping is thus:


  • Column 1 = address $FFFF, value = 00110110b = 3B
  • Column 2 = address $FFFE, value = 11001011b = CB
  • Column 3 = address $FFFD, value = 00000100b = 04
  • Column 4 = address $FFFC, value = 00000000b = 00
These values match up to the values we see when opening up the Klaus Test Suite Binary with a Hex editor.

In Summary

In this post we developed an FPGA core that can execute the 6502 Test Suite developed by Klaus Dormann.

With this done, however, we are only halfway there.

What still needs to be done is to write a ARM Cortex program that will reset our FPGA core and monitor the execution of the Test Suite.

Till next time!

Tuesday 21 November 2017

Simulating a complete 6502 system

Foreword

In the previous post we covered some Verilog basics.

In this post we will be developing the top module surrounding the 6502 core written by Arlet Ottens.

In the top module we will basically interface the 6502 core with a 64KB memory image of Klaus Dormann's Test Suite.

We will then run Klaus Dormann's Test Suite and see if all tests succeeds.

In this post, however, we will not be running our implementation directly on the FPGA, but rather in a simulation.

We will be using the simulation tool included in the FPGA development IDE from the vendor, called Vivado.

Installing Vivado

This blog assumes you have installed Vivado.

If you don't have Vivado you start by downloading install via this URL: https://www.xilinx.com/support/download.html

On the web page ensure that are are on the Vivado tab:



Scroll down to Full Product Installation and select the applicable download for your OS:


A web installer will download which allows you to select the components you want to install. Some components are free, while other are evaluation only.

For the purpose of this blog series the free components will do.

As part of the install ensure that Xilinx SDK is selected. This will be used to develop software running on the ARM Cortex processor.

Just a final note. As part of the installation you might be required to have a Xilinx user account. You can register on the Xilinx website and it is free!

Configuring Vivado for ZYBO

Next, let us configure the installed Vivado for use with the ZYBO board. Strictly speaking it is not necessary for this post, but since we are at the point of configuring Vivado, let us do all configuration in one go.

Firstly, we need to get the board file for the ZYBO board. This can be downloaded via the following link:

https://github.com/Digilent/vivado-boards/archive/master.zip

When you open the downloaded zip file, you will see a folder vivado-boards-master. Go into this folder.

You will see two folders, called new and old.

The old folder is folder is for pre-2015.1 Vivado versions. The folder that we will be using, is new, which is for Vivado versions 2015.1 and above.

Within the new folder you will see another folder called board_files. This folder corresponds to the following folder your within your Vivado installation:

{Vivado installation folder}/2017.1/data/boards/board_files

The part in your path 2017.1 might be different, depending on the version of Vivado you have.

Now, copy the contents of the folder board_files in the zip file to the board_files folder in your Vivado installation.

If you now restart Vivado, the ZYBO board configurations will be available in Vivado installation.

Creating a new Project in Vivado

Time to create a Vivado project for our 6502 simulation.

On the Vivado home screen select Create Project.

You will be presented with a wizard of pages.

On the first page click next.

Give your new project a name and click next.

On the next wizard page, Project Type, leave defaults as is and click next.

The next wizard page is where things starts to get interesting. When you click on the boards button, you will see a list of known boards your Vivado installation supports. Among them you will see a list of Zybo boards:



The list of Zybo boards you see in the list is the extra list of boards that appeared due to the set of steps you did in the previous section.

For simulation not that important, but when running your design on the FPGA it crucial that you select the ZYBO board corresponding to your board revision. The listed Zybo boards are very similar to each other. However, there is very subtle differences among the DDR-RAM parameters among them.

This subtle differences in DDR-PRAMETERS can just be enough to send you on a wild goose chase! I was one of those that ended in a goose chase for a week :-) I started off a project selecting Zybo Z7-10 (B.2) instead of the Zybo B.3.

With the incorrect board selected the ARM processor manage to start-up, but after executing a couple of machine code instructions, some memory location would just be modified out of the blue, causing chaos.

Anyway, back to the plot. With your board selected, click next. You will reach the summary page at which you need to click finish. In a minute or a new project will be created for you.

In the sources panel select the big '+' button to add some sources to your project:

On the first wizard page ensure that Add or create simulation sources is selected and click next.

On the next page we first need to add the two files from Arlet Ottens's core, called cpu.v and alu.v. So, click the Add Files button, and browse to these two files respectively on your hardrive, and add them.

Finally, it is neccessary to create a new file in which will create our top module. For this click the Create File module. For the filename, please specify top.v. Now hit OK and then finish.

The request files will now be added and created. After finished, your source panel will look similar to the following:



For top.v, Vivado has created an empty skelleton for you:


We will populate this skeleton in the next section.

Writing the top module

Let us now populate our top module with some code.

Firstly, let us instantiate an instance of the cpu module:

cpu  mycpu ( .clk (), 
            .reset (), 
            .AB (), 
            .DI (), 
            .DO (), 
            .WE (), 
            .IRQ (), 
            .NMI (), 
            .RDY() );

For now, all the signals is unconnected. We will connect them as we go along.

The specifics for the clock signal is defined as follows:

...
reg clk = 0;
...
always #10 
clk <= ~clk;
...

In the always block we are creating the pulsing clock for simulation purposes. Every time when the always block executes, it first wait 10 simulation cycles, indicated by the #10, and assign the inverse of the current clock state as the new value for clk.

Next, let us tie up the reset line:

...
reg reset = 1;
...
initial begin
  #50 reset <= 0; 
  #10000000 $stop;
end    
...

We start the simulation with the reset line asserted.

The value of the reset line we change within an initial block. An initial block is a initialisation block that if called just once at simulation startup. It should also be noted that initial blocks is specifically for simulation and most of the time it would not even synthesise to anything at all on the FPGA.

Let us look a bit closer to the contents of this initial block. The initial block waits 50 simulation cycles (i.e. 5 clock cycles), before setting the reset line to zero.

Our initial block has one more purpose: Killing the complete simulation after 10000000 simulation cycles.

The simulation then runs for 10000000 simulation cycles, before killing the complete simulation with the $stop directive.

The next couple of lines of the cpu core, AB, DI, DO and WE are all part of a memory interface. So next, let us spend some time for designing this memory interface.

First, we instantiate our 64KB memory array:

    reg [7:0] ram[65535:0];


This register is a bit different from our previous register declarations. It basically states that 65536 registers should created (indicated by [65536]) and each register should have eight bits (indicated by [7:0])

Now, let us define the necessary memory interface wires and a always block for assignments:

...
wire [15:0] addr;
wire [7:0] ram_in;
reg [7:0] ram_out;
...
 always @ (posedge clk)
 begin
  if (WE) 
  begin
   ram[addr] <= ram_in;
   ram_out <= ram_in;
  end
  else 
  begin
   ram_out <= ram[addr];
  end 
 end 
...

The always block closely models how a block ram element work and when doing synthesis, the tool will also pick up that want a block ram with this always block. It will then perform the synthesis accordingly.

With everything defined, we can now connect all the signals within our 6502 instance:

cpu  mycpu ( .clk (clk), 
            .reset (reset), 
            .AB (addr), 
            .DI (ram_out), 
            .DO (ram_in), 
            .WE (WE), 
            .IRQ (1'b0), 
            .NMI (1'b0), 
            .RDY(1'b1) );


For now we are not interested in the IRQ, NMI and RDY signals. For each of these signals we just choose a constant value that won't impede the operation of the CPU.

Almost done coding our top module. What remains to be done, is to populate our ram array with  a image of Klaus Dormann's Test Suite.

There is two verilog directives that help us out with ram array population:

  • readmemb: Read contents from a binary file and populate ram array with contents
  • readmemh: Read contents from text file with hexadecimal strings and populate ram array with contents.
readmemb is probably first price to use, because you can use the test suite binary as is. However, in a couple of cases I found that Vivado tools doesn't work so nice with readmemb. When trying to do the 6502 simulation with a readmemb, my Vivado IDE ended up in a endless loop. Using readmemh, however, I didn't experienced such issues.

The readmemh directive expects a text file with one hexadecimal number per line. Such a file is quite easy to create with the aid of a hexeditor.

With the binary file open in a hexeditor, copy the contents of the left panel (e.g. hexadecimal view) and paste into a text editor:



With the hex data in a text editor, you can replace each space with a newline.


There is one change you need to make to your hex file. The reset vector should be modified to start at 0400. Do this by changing the contents of memory locations fffc and fffd to 0 and 4 respectively.


With the hex file created, you can now add the following initial block within you top block:



initial begin
  $readmemh("{path to your hex file}", ram) ;
end


You need to replace {path to your hex file} with the applicable path to your hex file on your local drive.

Running the simulation

We are now ready to run the simulation.

On the left panel of the Vivado IDE, click run simulation and on the popup click Run behavioural simulation.

You will see a progress box running for a short while and then a wave window will open:


You will see that only a small time period of your simulation has run. This is due to some default Vivado settings. You can override this default if you want to, but most of the time it useful to see a quick snapshot of your simulation before running the full one.

You can resume the full simulation by click the play button on the top bar (labled run all).

You will see the display of the wave window updating while the simulation is running.

Running the Klaus Test Suite within a Verilog simulation can take ages to run. So, for now we just want some kind of idea that our 6502 environment is running more or less fine, and leave the complete test suite for the FPGA itself.

Here is some example of some sanity checks you can do:


On the wave output you do some spot checks on whether the ram give out the correct data for given addresses. remember that the output is only available in the next clock cycle for block rams as indicated by above diagram by the red lines.

This concludes our simulation exercise for this post.

In Summary

In this post we build the top module surrounding Arlet Ottens's 6502 core.

The top model contained interface to a RAM array populated with Klaus Dormann's Test suite code.

We ended off by running a simulation in Vivado and doing some spot check on whether the RAM is returning correct data for given addresses.

In the next post we will start with the physical FPGA implementation of top module we developed in this post.

Till next time!

Friday 17 November 2017

Verilog Basics

Foreword

In the previous post I gave a brief introduction about what this series of Blog Posts is about, which is creating a C64 system on an FPGA.

Originally I planned for this post to create a 6502 system for the FPGA that can run Klaus Dormann's 6502 Test Tuite.

However,  after some thought, I decided to dedicate this post about the principles around Verilog.

Verilog is the language we will be using to create our C64 FPGA implementation. Verilog is very similar to what you would expect from a computer language.

However, Verilog has some subtle principles that works differently from conventional programming languages. Hence, I thought the reason for this post, especially for newcomers that reads this post that has experience of conventional  programming languages.

Since I will be using Arlet Ottens's 6502 core as the starting point for the C64 FPGA implementation, I will be using pieces from this core to explain some of the principles of Verilog.

Unpacking the Arlet Core


Let us start off by having a look at the source code of Arlet Ottons' 6502 core. This will just give newcomers a better idea on FPGA programming.

If you have a look at source code for Arlet's core on github, you see two important files: cpu.v and ALU.v.

The *.v file extension denotes a Verilog file. More of the syntax used in this file in a moment.

Let us open up cpu.v for starters. You will see that it starts with the following lines

module cpu( clk, reset, AB, DI, DO, WE, IRQ, NMI, RDY );

input clk;              // CPU clock 
input reset;            // reset signal
output reg [15:0] AB;   // address bus
input [7:0] DI;         // data in, read bus
output [7:0] DO;        // data out, write bus
output WE;              // write enable
input IRQ;              // interrupt request
input NMI;              // non-maskable interrupt request
input RDY;              // Ready signal. Pauses CPU when RDY=0 

In the first line we have a module declaration, followed by a set of parameters it can accept.

In the next set of lines you declare for each parameter whether it is an input or an output.

You might have noticed that some parameters have an have numbers in brackets preceding their name.

Parameter DI is one such example. [7:0] means the input DI consists of 8 wires (or eight bits), been numbered from 7 down to 0.

For declarations that doesn't have the square bracket notation, the size of the relevant parameter is only a single bit.

Let us look further down:

reg  [15:0] PC;         // Program Counter 
reg  [7:0] ABL;         // Address Bus Register LSB
reg  [7:0] ABH;         // Address Bus Register MSB
wire [7:0] ADD;         // Adder Hold Register (registered in ALU)

reg  [7:0] DIHOLD;      // Hold for Data In
reg  DIHOLD_valid;      //
wire [7:0] DIMUX;       //

reg  [7:0] IRHOLD;      // Hold for Instruction register 
reg  IRHOLD_valid;      // Valid instruction in IRHO

This is almost like variable declarations. You may have noticed that I use the term "almost". A variable store is supposed to store a value. Some of the declarations above, does in fact store a value. However, declarations with the word 'wire' doesn't store a value at all, and is literally a wire connecting two components.

Let us now see how values gets assigned to these components.

Firstly an example of a wire assignment:

assign DIMUX = ~RDY ? DIHOLD : DI;

In this case the DIMUX wire set get assigned to the output of a combinational logic driven by the ready signal.

Registers gets assigned values via always blocks. More on this in the next section.

You might be wondering where the file ALU.v is been utilised. Scrolling down in cpu.v to the following snippet will reveal the answer:

ALU ALU( .clk(clk),
         .op(alu_op),
         .right(alu_shift_right),
         .AI(AI),
         .BI(BI),
         .CI(CI),
         .BCD(adc_bcd & (state == FETCH)),
         .CO(CO),
         .OUT(ADD),
         .V(AV),
         .Z(AZ),
         .N(AN),
         .HC(HC),
         .RDY(RDY) );

So, what is happening here? The double ALU on the first line is a bit confusing, so let us rewrite it a bit for explanatory purposes:

ALU alu_inst( .clk(clk),
         .op(alu_op),
         .right(alu_shift_right),
         .AI(AI),
         .BI(BI),
         .CI(CI),
         .BCD(adc_bcd & (state == FETCH)),
         .CO(CO),
         .OUT(ADD),
         .V(AV),
         .Z(AZ),
         .N(AN),
         .HC(HC),
         .RDY(RDY) );

This declaration basically declares an instance called alu_inst of the module ALU defined in ALU.v. This is like Object orientation in action :-)

The identifiers that is preceded by dot is module parameters defined within the ALU module. The identifiers within brackets is the corresponding signal within the containing module to which the corresponding ALU parameter should connect to.

This gives us more or less an idea on how modules gets wired together.

With all this said, you might be wondering where the parameters of the CPU module is linking to. The cpu will be contained within an module called the top module, which we will develop in the next post.

Basically every verilog project will contain a top module and is the only module of which you don't need to create in instance of. In object oriented terms, you can almost think of the top module as a singleton.

Always Blocks

In the previous section I mentioned that registers gets assigned with an always block.

Let us us now look into always blocks in more detail.

Here is an example of an always block:

always @(posedge clk)
    if( state != PUSH0 && state != PUSH1 && RDY && 
        state != PULL0 && state != PULL1 && state != PULL2 )
    begin
        ABL <= AB[7:0];
        ABH <= AB[15:8];
    end

You will see that for the assignment operator we use <= instead of just a =. This means that the assignment doesn't happen right away, but rather happens the moment when the clock signal transitions from a low to to a high (as indicated by the always statement).

Under the hood in the FPGA, this always block will translate into a number of D flip-flops (also called Data or delay flip-flop). When using Verilog you are quite shielded into which components your code gets translated into. However, just for better understanding I am going to pause a while at the D Flip Flop.

Firstly, here is the visual representation symbol for a D Type Flip-Flop:

A D Type Flop-Flop is a storage element and you can view its content via the Q output.

This flop-flop receives its input via the D input. As mentioned earlier, changing the input doesn't change the contents of the flip-flop right away. You first need to apply a clock pulse.

The clock input is donated by the input in the symbol by the small triangle. Each time this input transitions from a low to a high, the contents of the flip-flop will change to whatever value is present on the Data input during the clock transition.

I just want to go back briefly to our always block example. Another thing of importance with this always block is that during the clock transition, ABL and ABH will change its values simultaneously. This is also in contrast with conventional programming languages where you will first execute the first statement, and then the second one.

If my last statement confused you, hopefully the following diagram will clear it up:

Because of the always statement, the two flip-flops share the same clock line.

Another interesting always block scenario is the following:

always @(posedge clk)
begin
  a <= in;
  b <= a;
  c <= b;
end



In this always block, the element c will only output the value of in, after three clock cycles. The following diagram illustrate this scenario:

There is one final always block scenario I would like to cover before closing off this section:

always @*
    if( state == FETCH || state == REG || state == READ )
        alu_shift_right = shift_right;
    else
        alu_shift_right = 0;

In this always block we didn't explicitly specified a signal to use for clocking. Instead, using the asterisk we inform verilog to generate the sensitivity list, which will basically trigger an assignment whenever one of the input changes.

The interesting thing is that this always block will synthesise a computational block and alu_shift_right will be synthesised as a wire even though it is declared as a register.

This is a bit of a corner case we should be aware of that a register might not always be synthesised as flip-flop.

In Summary

In this post we discussed the principles of Verilog by using the source code of Arlet Ottens's core as an example.

From the topics we covered was registers, wires and Always blocks.

In the next post we will be developing the top module to accomplish a workable 6502 system that can execute 6502 Test Suite developed by Klaus Dormann.

Till next time!

Thursday 16 November 2017

Introduction

Background

The first time I saw a Commodore 64 emulator running on a PC was 17 years ago. Immediately when I saw it, I have wondered how it worked and had the desire to write my own.

This desire slumbered for years, until late 2015 when I scarped together the courage to write one in Java.

The following year I wrote one in JavaScript and then in Android.

I finished off the Android emulator in January 2017 and since then months has past since I played with 8-bit emulators.

Seven months later, though, I started contemplating with a new idea: A complete C64 system on a FPGA.

Before I continue the story, let just stand still a moment at the term FPGA, for those that is not quite familiar the term.

FPGA is the acronym for Field Programmable Gate Array. It is basically a chip containing a bank of logic gates and/or functional blocks which you can connect together to form a function you desire. Typical functions that you can implement in an FPGA include graphic acceleration and digital signal processing.

The nice thing about an FPGA is that you can reconfigure it.

Anyway, back to the main story.

I had a look on the net to see if there is any current C64 on FPGA implementations and couldn't really find any. The closest I could find was 6502 oriented systems, connected via a serial interface.

These serial interface implementations is nice if you want to play around with legacy BASIC, but without fancy graphics!

This was one of the main reasons I wanted to start writing my own C64 implementation on a FPGA.

Which FPGA Development Board?

The big question is which FPGA Development board will be most fit for the purpose of a complete C64 System?

Probably the most important feature for this is having some kind of video output for the graphics.

One FPGA development board that fits this need is the Basys 3 board from Digilent, which features a VGA port:


I had a lot of fun with this board until I read more on how LCD VGA monitors work. I ended up with the question on how to interface a PAL/NTSC signal with a LCD monitor with an odd resolution of 1366x768 and maintaining the aspect ratio. Not an easy question to answer, at least not with some kind of framebuffering.

The only resource the Basys 3 have available for framebuffering is on FPGA Block-RAM, which will give you a couple of screen line buffering. Then you would also need to reserve some Block RAM space for 64KB of C64 RAM, and the three ROMS of the C64 system.

This all adds up to a very tight Block RAM budget, and this board doesn't provide us with the luxury of a couple of Megs of SDRAM.

I needed a board that would give me some more breathing space.

After some more research, I found the ZYBO board, also from Diligent:


This board features an FPGA with similar capabilities as the Basys 3, and a 512MB of onboard SDRAM.

The added SDRAM will surely provide some helping should we be running out of Block RAM resources.

The ZYBO board will be indeed be the board I will be using through this BLOG series.

More on the ZYBO board

The most interesting part of the ZYBO board is the FPGA chip itself.

The FPGA chip is a ZYNQ. This is basically a dual ARM Cortex  core, together with the FPGA as shown in the following block diagram from the Diligent website:


The FPGA part is represented via the yellow block at the bottom.

As you can see from the diagram, the FPGA also have access to the diagram via AXI ports, which is a protocol defined ARM Holdings.

When you write a program to run on the ARM Cortex processor, you also have access to design in the FPGA fabric via memory mapped registers. I will discuss this in more detail in a future post.

The fact that you the ARM core can "see" your FPGA design, means that you can also harness the ARM core to assist in debugging your FPGA core. This makes live a bit simpler by not having you to add extra debugging cores in FPGA design.

Approach

This Blog series I will also approach in an incremental fashion. Starting with something simple and gradually evolving it.

I will be building on the 6502 core written by Arlet Ottens, available on Github.

Arlet Ottens have done some excellent work on writing a 6502 core that can run on an FPGA  having Block RAMS.

Block RAMS work a bit different than the RAM you used to get in a C64.  In C64 when you asserted the Read line and address on the memory bus, there was a small time period when the data on the databus was in a uncertain state.

Granted that this uncertain period was a disadvantage, on the other hand if you keep your clock period longer than this uncertain period, it can actually simplify your CPU design! I mean, if you a assert your lines now, you can have your data in the next clock transition!

With Block RAM however, you need to wait an extra full clock cycle for your data, from asserting the lines on the bus before you get your data.

This extra cycle in between causes that you cannot use a direct 6502 implementation as is.

Arlet had to jump through a couple of hoops to get a 6502 implementation working with Block RAMS.

Another 6502 project I will be leveraging from, is the 6502 Test Suite developed by Klaus Dormann to verify that my FPGA implementation is sound.

The Test Suite developed by Klaus Dormann is very comprehensive. Should a 6502 device or emulator pass all the tests, you are almost guaranteed that you have a sound 6502 implementation. 

So in short. In the next post I will be running Arlet Ottens's core in simulation with the 6502 Test Suite developed by Klaus Dormann.

Once convinced that Klaus Dormann's 6502 Test Suite ran successfully, I will be adding parts to the FPGA on the way to a full C64 implementation.

In Summary

This post was an introduction to my new series of Blog posts regarding the creation of a Complete C64 system on an FPGA.

The FPGA board I will be using throughout the series will be a ZYBO from Digilent.

In the next post I will be running Klaus Dormann's 6502 Test Suite on Arlet Otten's 6502 core in a Verilog simulation.

Till next time!