Wednesday, 31 March 2021

Running an Amiga core on a Zybo Board: Part 2

Foreword

In the previous post I gave a quick rundown on what I plan for the next couple of posts, which is to get an Amiga core running on a Zybo board.

In this post we will get the fx68k core running on an FPGA, which is an FPGA implementation for the 68000 CPU.

Ironically, the FPGA I will be using for this exercise will not be the Zybo board, but one of my lower spec FPGA boards, the Basys 3, which is also Xilinx based.

The reason for this decision is because in the Zybo board one needs to wrap every core into an IP, which is a bit time consuming, if you are after quick R&D. Once we got to a point where many of the components are running, we can move to the Zybo board.

Simulation issues

As with many FPGA projects, one always start testing your design with simulation.

In the couple of years I was playing with FPGA's, I found that the fx68k was one of the cores I wasn't able to run in a simulator. I am referring here to the simulators that is free of charge. Not sure about commercial ones.

When trying to simulate fx86 within Vivado, it will be stuck in the Elaboration step for hours. Even after 7 hours, Vivado couldn't get passed this step.

When trying to use iVerilog, it didn't understand all the SystemVerilog syntax in the design.

Verilator's support for SystemVerilog is quite good, but in the fx68k design there are some structures, where part of the bits is defined as wires and the rest of the bits is defined as registers. Verilator doesn't like such structures.

So, concerning testing the fx68k core, I will run it on the FPGA itself. This proofed not to be a major issue.

If anyone had any luck in simulating fx68k with a particular simulator, please let me know.

Setting up the project

As mentioned earlier, I will not be using my Zybo board to test the fx68k, but a Basys 3 board, in which you don't need to package all your cores into IP blocks.

We start by creating a top module block:

module top(
   input clk,
    );
...
endmodule

When working with the Basys 3 board, it is important to constraint all the ports of your top level block in an XDC file.

In this case we have an input clock which we constrain as follows:

set_property PACKAGE_PIN W5 [get_ports clk]
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -period 10.000 -name sys_clk_pin -waveform {0.000 5.000} -add [get_ports clk]
Since we work here with a Basys 3 board, pin W5 of the FPGA chip is connected to an external clock clocking at 100MHz.

Now 100MHz is probably a bit too fast for our design, and I am not sure if all timing requirements will be met if we clock the fx68k core that high. So, I will instantiate a slower clock, which will clock around 16MHz.

Vivado supports a clock wizard, which allows you to create a template block, and all instances of this template block will output 16MHz.

We create an instance of this clock within our top module as follows:

module top(
   input btnC,
   input clk,
   output [15:0] led
    );
...
wire clk_16mhz;
...
clk_wiz_0 clk_wiz_0 
 (
  // Clock out ports
  .clk_out1(clk_16mhz),
 // Clock in ports
  .clk_in1(clk)
 )
endmodule

clk_16mhz is the clock signal clocking at 16MHz, and we will use this clock in our design.

Next, let us see how to connect the ports of the fx68k core:

    fx68k fx68k(
        .clk(clk_16mhz),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // These two signals don't need to be registered. They are not async reset.
        .extReset(button_reset),            // External sync reset on emulated system
        .pwrUp(button_reset),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .ASn(As),
        .LDSn(Lds),
        .UDSn(Uds),
        .DTACKn(0),
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(1),
        .IPL1n(1),
        .IPL2n(1),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
        );

There are a couple of ports that we can leave unconnected for now. Let us quickly go through the purpose of some of these ports:

  • extReset and pwrUp: In my design I have linked these ports to a button on the Basys board. Preferably one should pass the button through a debounce core, so you don't have erratic spikes on the reset port.
  • eWRn: This port indicates to memory whether the fx68k want to read or write. Read is indicated with a 1 and a write with a 0.
  • ASn, LDSn, UDSn: These signals are asserted during read/write cycles. We don't really need this for our fx68k, but it helps to see it on a signal trace to see if everything is working correctly.
  • iEdb and oEdb: This is for data in and data out from memory.
  • eab: Address request to memory
Just a note on enPhi1 and enPhi2. In my design I toggle these signals on every clock cycle and it is important that these two signals are always the inverse of each other. So, here is my simple implementation for phi:

    always @(negedge clk_16mhz)
    begin
      phi <= ~phi;
    end

Before ending this section, I would like to make mention of two rom files in the fx68k source, microrom.mem and nanorom.mem. It is important to ensure that the contents of these two files gets loaded into the synthesised design.

If you open up fx68k.sv, you will see that these files are been used by the following modules:

module uRom( input clk, input [UADDR_WIDTH-1:0] microAddr, output logic [UROM_WIDTH-1:0] microOutput);
	reg [UROM_WIDTH-1:0] uRam[ UROM_DEPTH];		
	initial begin
		$readmemb("microrom.mem", uRam);
	end
	
	always_ff @( posedge clk) 
		microOutput <= uRam[ microAddr];
endmodule


module nanoRom( input clk, input [NADDR_WIDTH-1:0] nanoAddr, output logic [NANO_WIDTH-1:0] nanoOutput);
	reg [NANO_WIDTH-1:0] nRam[ NANO_DEPTH];		
	initial begin
		$readmemb("nanorom.mem", nRam);
	end
	
	always_ff @( posedge clk) 
		nanoOutput <= nRam[ nanoAddr];
endmodule
It is important that these files is in a location that is accessible to the synthesising tool. If in doubt, rather use absolute paths.

During synthesis you should also see some messages whether it was able to load these files successfully.

Hello World boot

Now it is time to test the fx68k core. For this purpose, we will writing a very 68000 assembly program:


The assembly language is the column on the far right, and the machine language equivalent is in the middle column.

Before continuing, I would like make mention of the tool I used to get this view. This is an online tool you can get  access via the following url: onlinedisassembler.com/. This tool allows you to disassemble machine code from a variety of CPU's.

Back to the machine language. If one look at the machine code, one will realise that the 68000 is a big endian architecture, that is the most significant parts of a value is stored first in memory. This is in contrast to an architecture like the i386 and ARM processors, which uses little endian.

Now let us look into the test program in more detail:

  • movew #1285,%d0: Here we load the register d0 with the value 1285
  • movew %d0, 0x00858585: Next, we store the value we loaded in the register d0 to memory at location address 0x858585.
  • jsr 0x93e86: Jump to subroutine at memory location 0x93e86.
Somewhere in this assembly language program I have introduced something that is not allowed, and we will see later how the 68000 core handle this scenario.

Now, the next question is, how to we get the fx68k core to execute our program? We need to know what location the 68000 will jump to, when it comes out of reset. Unfortunately this information is not easy to come by.

After quite a bit of searching, I found this information in the M68000 user manual on page 5-29 in the section Reset Operation. I quote:

When RESET and HALT are driven by an external device, the entire system, including the
processor, is reset. Resetting the processor initializes the internal state. The processor
reads the reset vector table entry (address $00000) and loads the contents into the
supervisor stack pointer (SSP). Next, the processor loads the contents of address $00004
(vector table entry 1) into the program counter.

So, in short, we need to ensure that our implementation should return the initial stack pointer when the CPU requests the contents for memory location 0, 1, 2 and 3.

Similarly, our implementation should return the initial value for the program counter when the CPU requests the contents of memory location 4, 5, 6 and 7.

It should be noted that the fx68k access memory in chunks of 16 bits, so addresses posted by the CPU on the address bus we need to multiply by 2 to get the byte address equivalent. The implications of this will become apparent in a moment.

So, let us write some simple logic that feed our program in chunks to the CPU:

  
always @(posedge clk_16mhz)
        begin
            if (add == 16'h0)
            begin
              data_in <= 7;
            end else if (add == 16'h1)
            begin
              data_in <= 16'h9fe7;
            end else if (add == 16'h2)
            begin
              data_in <= 4;
            end else if (add == 16'h3)
            begin
              data_in <= 16'hfe00;
            end else if (add == 20'h27f00)
            begin
              data_in <= 16'h303c; //load immediate
            end else if (add == 20'h27f01)
            begin
              data_in <= 16'h0505;
            end else if (add == 20'h27f02)
            begin
              data_in <= 16'h33c0; //store
            end else if (add == 20'h27f03)
            begin
              data_in <= 16'h0085;
            end else if (add == 20'h27f04)
            begin
              data_in <= 16'h8585;
            end else if(add == 20'h27f05)
            begin
              data_in <= 16'h4eb9;
            end else if (add == 20'h27f06)
            begin
              data_in <= 9;
            end else if (add == 20'h27f07)
            begin
              data_in <= 16'h3e86;
            end
            else begin
              data_in <= 16'h33c0;
            end
        end
As mentioned previously, the address we select by is a word address and not a byte address.

Also, as per the M68000 user guide, location 0 contains the initial value for the stack pointer, which in this case is 0x79fe7. All addresses stored in memory are byte addresses.

Location 2 contains the initial value for our programming counter. As this is a word address, we need to multiply by two to get to the byte address, resolving to 4, which corresponds to the address provided in the user guide. Our initial program counter value in this case is 0x4fe00.

Finally, as seen our snippet of code goes up to word address 0x27f07, and for all other address requests we return the value 0x33c0.

Test Results

Let us have a look at the test results. Below is some waveforms I have captured from my Basys 3 board from the point it got out of reset:





As seen the fx68k core starts off reading word locations 0 - 4, which contains the stack pointer and program counter. The next word location that is read is address 0x27f00. Mutliplying this value by 2 gives 0x4fe00, which matches the value we have provided for the Program Counter. 

If one follows the rest of the waveform, the read_write signal eventually goes low and the value 0x0505 gets written to memory. This is still expected behaviour from our test program.

After the above mentioned write another write kicks off to location 0x3cff2 and then no further reads or writes occurs.

Word address 0x3cff2 (e.g. byte address 0x79fe4) looks like our stack address pointer we have defined, and we could argue that the last instruction executed was a jsr, which will push an address to the stack. However, whatever the processor tried to push to the stack, it failed to finish off what it was trying to do. 

I was wondering if there was something wrong with the initial stack pointer we have provided, so I started playing with different values. One of the values I used as a stack pointer were 0xc033c0.

This time around, using 0xc033c0 as the initial stack pointer, there appeared much more stack operations, looking at address requests. 

Something else that was interesting, was the reading of locations 6 and 7, which seemed like some kind of vector. Referring back to the  M68000 user guide, I found a table on page 6-7 explaining all the vectors. As it turns out, word address 6, which is byte address 12, is an address error. Further on in the user guide, on page 6-19, an Address error is described as:

An address error exception occurs when the processor attempts to access a word or longword operand or an instruction at an odd address. 
And indeed, we were writing the value of register d0 to memory location 0x858585, which was an odd address.

If we change the address to an even number like 0x858586, everything seems to work fine:


The return address gets pushed onto the stack at memory word locations 0x6019de and 0x6019df.

We also see that eventually contents is read from location 0x049f43 onwards, so it seems that our jsr has executed correctly. For this new location our implementation will just return the value 0x33c0.

If we look back at our test program, we see that the value 0x33c0 is actually the opcode for storing the value of the register d0 to memory. This is exactly what is happening towards the end of the waveform.

In Summary

In this post managed to get the fx68k core to run on Basys 3 development board.

In the Test result section, I spend a bit of time unpacking the resulting waveform.

In the next post we will continue our journey to get an Amiga core to run on a Zybo/Basys 3 board.

Till next time!

No comments:

Post a Comment