Sunday, 25 April 2021

Running an Amiga core on a Zybo Board: Part 4

Foreword

In the previous post we managed to get the Source code of the Mini Amiga project to compile in Vivado and ran the resulting bitstream on the Basys 3 board.

With the core on the Basys 3, we did a very basic test, confirming whether the address requests to external SDRAM was targeted for the Kickstart ROM area.

Our next goal is to get the Kickstart ROM to run.

You might have noticed towards the end of the previous post that I am alternating between the term Kickstart ROM and AROS ROM. Many readers are more familiar with the term Kickstart ROM, whereas we are going to run the AROS ROM image in our design. So, just keep that in mind when I interchange between the two terms.

The Kickstart ROM is 512KB in size, more than the amount of Block RAM the Basys 3 can offer. We will therefore move in this post to the Zybo board, which supports external SDRAM.

Our primary focus in this post will be to develop an interface where, given an address, the interface will one word of data from that location in SDRAM.

Overview

Quite a while ago, while I was still developing a C64 block for an FPGA, I have also created a wrapper for accessing SDRAM on the Zybo board, here.

The purpose was to save the frames produced by the the VIC-II module and render it to a VGA screen at a different frame rate.

In the design I used the IP Burst block, provided by Xilinx, to shield me from the details of the AXI protocol.

The design I ended up with, was a streaming interface, that accessed data in a serial fashion. This is a suitable interface for rendering frames to a VGA screen, where the data required is also of a serial nature.

With the streaming interface we can easily predict which data will be required in the future and therefore prefetch the data, therefore mitigate the effect of latency.

In our Amiga design, however, the 68000 will be accessing the SDRAM in a much more random fashion, so a streaming interface will not give us any real benefit. So, in this post we will be designing a very simple SDRAM interface, where we provide an address, and the interface will return the relevant word of data from SDRAM.

Obviously, with this interface SDRAM latency will work against us, but for coming posts we will first try to get the system to work and attempt to fix the latency issues later on.

Creating an AXI Block

In a previous post, here, I explained how to create an AXI block in a Zybo block design.

In that post, I have also explained how to utilise IP Burst Block in your created AXI block. As mentioned earlier, the IP Burst block shields you from the technicalities of the AXI protocol.

Well, after a couple of years down the line, I tend to disagree with the statement in the previous paragraph. Working with AXI protocols is not that bad and using the IP Burst block is a bit of an overkill.

The thing is, when creating an AXI master block, the wizard do create a template for you that is basically an example of how to use the AXI protocol. A couple of years ago, however, this template just looked like a very complex state machine, and so I decided to use the IP Burst block as an alternative.

Having a look at the state machine that the AXI Block Wizard create, one can see that it is basically a memory tester. It starts off writing a certain test pattern of data to memory and afterwards read it back from memory, checking to see if it matches the test pattern.

It is easy enough to alter the path of this state machine to read or write on demand. We will tackle this in the next section.

Modifying the AXI block

Let us modify the AXI block to fit our needs.

First we need to look at the state machine that is implemented in case-statements. The state machine transitions as follows: IDLE > WRITE > READ > COMPARE > IDLE.

In order to implement our read on demand, we need to transition back to IDLE after a read or a write. Also in the IDLE state we need to directly transition directly to a READ or a WRITE given a command. We do this as follows:

           if ( init_txn_pulse == 1'b1)                                                      
             begin                                                                                        
               mst_exec_state  <= user_write ? INIT_WRITE : INIT_READ;                                                              
               ERROR <= 1'b0;
               compare_done <= 1'b0;
             end                                                                                          
           else                                                                                            
             begin                                                                                        
               mst_exec_state  <= IDLE;                                                            
             end

Here user_write is an input port we define on the module. If it is a 1, it is a write and a read otherwise.

Next let us add some ports to the AXI block for supplying commands:

...
// Users to add ports here
        input [31:0] user_address,
        input [31:0] user_data,
        input user_write,
// User ports ends
...

So, we have a port to specify an aaddress for a read/write command. Also, for a write, we have a port to specify the data.

Let us have a look at how these ports are going to be used:

// Next address after AWREADY indicates previous address acceptance    
 always @(posedge M_AXI_ACLK)                                        
 begin                                                                
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                            
     begin                                                            
       axi_awaddr <= user_address;                                            
     end                                                              
   else if (M_AXI_AWREADY && axi_awvalid)                            
     begin                                                            
       axi_awaddr <= axi_awaddr/* + burst_size_bytes*/;                  
     end                                                              
   else                                                              
     axi_awaddr <= axi_awaddr;                                        
   end          
...
/* Write Data Generator                                                            
Data pattern is only a simple incrementing count from 0 for each burst  */        
 always @(posedge M_AXI_ACLK)                                                      
 begin                                                                            
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                                        
     axi_wdata <= user_data;                                                            
   //else if (wnext && axi_wlast)                                                  
   //  axi_wdata <= 'b0;                                                          
   else if (wnext)                                                                
     axi_wdata <= axi_wdata/* + 1*/;                                                  
   else                                                                            
     axi_wdata <= axi_wdata;                                                      
   end              
...
// Next address after ARREADY indicates previous address acceptance  
 always @(posedge M_AXI_ACLK)                                      
 begin                                                              
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                          
     begin                                                          
       axi_araddr <= user_address;                                          
     end                                                            
   else if (M_AXI_ARREADY && axi_arvalid)                          
     begin                                                          
       axi_araddr <= axi_araddr/* + burst_size_bytes*/;                
     end                                                            
   else                                                            
     axi_araddr <= axi_araddr;                                      
 end
...

In this snippet of code, we have kept the original template code more or less intact. It is just the pieces in bold that we have changed.

You will notice that in a couple of place we use init_txn_pulse. This is used to trigger a read or a write transaction.

One thing we still need to do is to indicate to the outside world when a read/write transaction has completed and also send the data when the transaction was a read. To do this, we just need to watch the signals wnext and rnext:

...
        output reg [31:0] user_data_out,
        output reg user_data_ready,
...
    always @(posedge M_AXI_ACLK)
    begin
      if (wnext || rnext)
      begin
        user_data_ready <= 1;
      end else if(!INIT_AXI_TXN)        
      begin
        user_data_ready <= 0;
      end      
    end
...
    always @(posedge M_AXI_ACLK)
    begin
      if (rnext)
      begin
        user_data_out <= M_AXI_RDATA;
      end
    end
...

Keep in mind that M_AXI_ACLK is clocking at 100MHZ, so for one clock cycle rnext/wnext might be high, and low at the next one. The Mini Amiga block is clocking considerably slower than 100MHz, and will miss these notifications if we rely directly on rnext/wnext. It is for that reason why we are storing the value for user_data_ready and user_data_out.

Testing the AXI block

Let us write a module for testing the AXI block we have created in this post.

With the test we will basically write some test data to SDRAM and then read it back.

We will implement this with a very simple statement machine, which will be a counter of which its bits will have different purposes.

This counter will be 11 bits wide, if which we will use the bits as follows:

bit 10: Read/Write
bit 9/8: Index for generating read data/address
bit 7-0: Lower part of counter

As mentioned above, we will be using bits 9 and 8 as a index to generate some random addresses for reading and writing, which we will do as follows:

always @(*)
begin
  if (counter[9:8] == 0)
  begin
    data_out = 20;
  end else if (counter[9:8] == 1) 
  begin
    data_out = 120;
  end else if (counter[9:8] == 2) 
  begin
    data_out = 30;
  end else if (counter[9:8] == 3) 
  begin
    data_out = 10;
  end else 
  begin
    data_out = 111;
  end
end

always @(*)
begin
  if (counter[9:8] == 0)
  begin
    address = 20;
  end else if (counter[9:8] == 1) 
  begin
    address = 120;
  end else if (counter[9:8] == 2) 
  begin
    address = 30;
  end else if (counter[9:8] == 3) 
  begin
    address = 10;
  end else 
  begin
    address = 111;
  end
end

We use the lower part of the counter, bits 7 - 0, to decide when to trigger the init_txn pulse. We use such a big range to ensure that we leave enough gap for latency:

always @(posedge clk)
begin
  if (counter[7:0] == 20)
  begin
    init_txn <= 1;
  end else if (counter[7:0] == 118) 
  begin
    init_txn <= 0;
  end
end

This is enough coding for a test. Let us do some testing.

The following logic trace shows some results:

The first two lines contains the requested address. At the bottom the read/write requests are shown, of which the section shown is mostly reads.

The line, user_data_out, shows the data that is read back from memory. This is not very clear in the picture, but the values are 0x14, 0x78, 0x1e. These values converted to decimal are 20, 120 and 30.

This corresponds to the values we used in our test module.

In Summary

In this post we have created an AXI block that will enable us to read and write to SDRAM on the Zybo board.

In the next post we will integrating this AXI block with the Mini Amiga block, and try to boot the AROS ROM.

Till next time!

Wednesday, 14 April 2021

Running an Amiga core on a Zybo Board: Part 3

Foreword

In the previous post we managed to get the fx68k core to run on a Basys 3 board, and execute a very simple machine code program.

In this post we will continue our journey in getting an Amiga core to run on a Zybo board. Having said that, in this post we will again do the exercise on the Basys 3 board. We will be finally moving to the Zybo board in the next post.

As mentioned in a previous post, we will be scrutinising the Mini Amiga project, here, that will form the basis of our Amiga exercise.

The Mini Amiga project is based on a Altera FPGA, and for that reason we need to scrutinise the code of this project. However, this makes the journey so much more exciting.

A bit of background on the Mini Amiga project

If one reads this article on Wikipedia, one can see that the original Mini Amiga project was based on a physical Motorola 68000 CPU, and the Amiga chipset was implemented within an FPGA.

Since then, the source code of the Amiga Project was adjusted so it can be used within the main MiSTer project.

To understand the source code of the Mini Amiga Project, a good starting point would be to look at the file rtl/minimig.v. This file used to be the top level module for the original Mini Amiga project. Looking at the inputs/outputs of this module, it is immediately obvious that this module doesn't host a 68000 implementation, and need to be connected externally.

To cater for an on-FPGA implementation for the 68000, a couple of wrappers were created around the minimig module. More on this in the next section.

A deeper look into the source code

In the previous section I mentioned that rtl/minimig.v used to be the top level module for the original implementation for the Mini Amiga project, and that wrappers were written to interface it with an on-FPGA 68000 implementation.

Let us have a closer look at these wrappers. We start by looking into the file Minimig.sv, which is located in the root. This file hosts a module called emu, which in turn instantiates an instance of the minimig module.

Something else that is also interesting of the module emu, is that it instantiates an instance of cpu_wrapper, and it is here where we actually create an instance of the fx68k core and linking up the ports to the corresponding minimig instance.

The rest of the wrapper code gets very specific to the features of the DE10-Nano board.

For our goal of implementing an Amiga on a Zybo board, we will be creating our own wrappers around the Minimig module.

Pruning the Minimig module

When porting a complex project from one platform to another, a lot of times it is required to have an in-depth understanding of how the system works.

This is easier said than done, especially if it is not possible to play with the system on the original platform.

In our case, we have a similar scenario. Your preferred FPGA board for implementing the Mini Amiga will probably not be the DE-10 Nano board, for which the current project is written for. Also, buying the DE-10 Nano board to gradually understand the components of the Mini Amiga project, before moving to your actual choice of FPGA board, doesn't sound like an economically viable option either.

To get around the problem of complexity, I will be stripping down the Minimig module to its most basic form, and then re-adding functionality as we go along.

We start by looking at the ports of the Minimig module, which are grouped as follows:

m68k pins
sram pins
system pins
rs232 pins
I/O
host controller interface (SPI)
Video
RTG Framebuffer control
audio
user/io
fifo/track display

In our first round, we will only be using the first three group of ports, which are m68k pins, sram pins and sram pins. The rest of the ports we will remove or comment out.

Next, let us have a look inside the minimig module, as which instances we can remove. I have found the following to remove:

userio
rtg

Modifying top.v

We are now going to modify the top module we have created in the previous post, so it can include an instance of the minimig module.

One thing you will notice when connecting the ports of the Minimig module, is that there is quite a number of clock inputs, like clk7_en, c1, c3, cck and eclk. Luckily a module is the provided in the Mini Amiga project for generating these signals given a clock input. This module is present in the file rtl/amiga_clk.v.

Let us create an instance of this module in our top module:

module top(
...
    );
...
amiga_clk amiga_clk
        (
          .clk_28(clk_28mhz),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(button_reset))
        );
...
endmodule

Here is another mystery. The amiga_clk module wants a 28MHz clock input, whereas in the previous post we have defined a clock of 16MHz for clocking our CPU.

It turns out that in the Mini Amiga project, the CPU is clocked at 28MHz, whereas most of the Amiga components have a resulting clock speed of 7MHz.

We therefore need to adjust the frequency of our generated clock from 16MHz to 28MHz (actually 28.375160MHz, to be exact).

The resulting fx68k and minimig instance look as follows:

...
    fx68k fx68k(
        .clk(clk_28mhz),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // input logic HALTn = 1'b1,            // Not all tools support default port values
        
        // These two signals don't need to be registered. They are not async reset.
        .extReset(reset_cpu_in),            // External sync reset on emulated system
        .pwrUp(reset_cpu_in),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .oRESETn(reset_cpu_out),

        .ASn(As), 
        .LDSn(Lds), 
        .UDSn(Uds),
        .DTACKn(data_ack), 
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(interrupts[0]), 
        .IPL1n(interrupts[1]), 
        .IPL2n(interrupts[2]),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
        );
...
minimig minimig
 (
     //m68k pins
     .cpu_address(add), // m68k address bus
     .cpu_data(data_in),    // m68k data bus
     .cpudata_in(data_out),  // m68k data in
     ._cpu_ipl(interrupts),    // m68k interrupt request
     ._cpu_as(As),     // m68k address strobe
     ._cpu_uds(Uds),    // m68k upper data strobe
     ._cpu_lds(Lds),    // m68k lower data strobe
     .cpu_r_w(read_write),     // m68k read / write
     ._cpu_dtack(data_ack),  // m68k data acknowledge
     ._cpu_reset(reset_cpu_in),  // m68k reset
     ._cpu_reset_in(reset_cpu_out),//m68k reset in
     .nmi_addr(0),    // m68k NMI address

     //sram pins
     .ram_data(),    // sram data bus
     .ramdata_in(22),  // sram data bus in
     .ram_address(ram_add), // sram address bus
     ._ram_bhe(),    // sram upper byte select
     ._ram_ble(),    // sram lower byte select
     ._ram_we(),     // sram write enable
     ._ram_oe(),     // sram output enable
     .chip48(),      // big chipram read
 
     //system    pins
     .rst_ext(),     // reset from ctrl block
     .rst_out(),     // minimig reset status
     .clk(clk28mhz),         // 28.37516 MHz clock
     .clk7_en(clk7_en),     // 7MHz clock enable
     .clk7n_en(clk7n_en),    // 7MHz negedge clock enable
     .c1(c1),          // clock enable signal
     .c3(c3),          // clock enable signal
     .cck(cck),         // colour clock enable
     .eclk(eclk),        // ECLK enable (1/10th of CLK)
 );
...

One the changes we did here, was to feed interrupts from the minimig module to our CPU.

One interesting part of the minimig module is the sram section. This section should be interfaced to external SDRAM or DDRRAM, and the idea is that every time when access a memory location that forms part of chipram/fastram or kickstart ROM, the memory request will be redirected to these ports.

My plan is to interface the sram pins to the DDRRAM present on the Zybo board via AXI, in future posts.

In this post, however, we will only be looking at the addresses send on the ram_address port, and verify that addresses is send that falls within kickstart ROM range upon start-up. More on this later.

Building and testing

When trying the synthesise create a bitstream I encountered a couple of errors in Vivado. First of all, Vivado doesn't like the concept of local registers. As an example, take a look at the following snippet from ciaa.v:

// generate a keystrobe which is valid exactly one clk cycle
always @(posedge clk) begin
	reg kms_levelD;
	if (clk7n_en) begin
		kms_levelD <= kms_level;
		keystrobe <= (kms_level ^ kms_levelD) && (kbd_mouse_type == 2);
	end
end

In this case, the register kms_levelD is a local register. To make Vivado happy, you will need to move the register declaration outside of the always block.

Keep in mind that there is some cases where the same local register name is used in a couple of always blocks, in which case you will need give them all unique names, which is quite an undertaking.

I have picked up quite a number of places in the Mini Amiga source where register/wires are used and declared only later on in source files. Vivado doesn't like this either.

You will also find that the file denise_colortable_ram_mf.v will fail to compile on Vivado. This file is very specific to Altera FPGA's.

To get around this error, just comment out the instantiation of altsyncram_component. We will revisit this module in a future post.

Eventually everything compiled, and I moved on to testing. As mentioned in the previous post, was goal was to see if addresses were output on the ram_address port that was within the range of the kickstart ROM, upon startup.

Let me spend a moment to explain this. Staring at memory address 0, we have chipram, and ROM only starts at address $F80000.

With a Motorola 68000, the above setup is problematic, because at start-up the 68000 reads the initial value for the program counter from memory location 4, which is in chipram.

On the Amiga this dilemma is solved by mapping the Kickstart ROM also to location 0 upon start-up. Upon the first read from the CIA, this mapping is removed and the chipram starting at location is free to use.

So, in my test I will be checking to see if addresses starting at $F80000 will appear on the ram_address port upon startup.

In my testing, a couple of things went wrong. Firstly, the minimig module didn't create a reset signal on the _cpu_reset pin for resetting the CPU. It turned out that the userio module, that we have commented out, is responsible for generating the reset signal.

To get around this issue, I have linked up our button_reset signal to the minimig module as follows:

module minimig
(
...
 input button_reset,
...
);
...
assign cpurst = button_reset;
...
endmodule

Also, our amiga_clk instance will need to use the _cpu_reset port of the minimig module:

module top(
...
    );
...
amiga_clk amiga_clk
        (
          .clk_28(clk_28mhz),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(_cpu_reset))
        );
...
endmodule

Another signal with a similar issue was the halt signal. This signal is also controlled by the userio module, which can halt the cpu on demand.

Again, for this signal we are going to implement a quick fix. The cpuhlt wire in the minimig module we need to assign the value 0.

With these changes, I was able to proceed and I got the following waveforms as result:

Looking at the ram_add signal, we can see addresses 0x3c0000 and 0x3c0001 every time just before the As signal transitions to high. Converting these addresses to byte addresses, we get 0x780000 and 0x780002. This misses the Kickstart ROM address range of 0xf80000 by one bit.

Carefully looking at the Mini Amiga source code, it looks like these addresses are expected. Thinking about this, I realised the reduced start address of kickstart ROM to 0x780000 actually makes sense. If we have kept the address 0xf80000, it is just wasting RAM.

In Summary

In this post we did a quick round of getting the Minimig source code to compile in Vivado, and verified that the ram addresses are as expected.

In the next post we will be moving to the Zybo board, trying to get the the AROS ROM to boot, with the ROM image sitting in SDRAM.

We will be using the AROS ROM instead of the official Kickstart ROM, since the AROS ROM doesn't require a license.

Till next time!