Friday 5 January 2024

Extending our Hypothetical Amiga core

Foreword

In the previous post we added some more functionality to issue some more sensible commands to the second channel of our memory controller.

We also created a Hypothetical Amiga core for serving our second memory channel with the sensible read/write commands. This Hypothetical Amiga core serves as a Work in progress, which we will continue to add more and more functionality in coming posts.

In this post we will hook up the Minimig (aka Mini Amiga) core to our almost mostly empty Amiga core.

We briefly played with the Minimig core a couple of posts ago, just to get a feel of how it works.  In particular, one of the feature we briefly looked at of the Amiga, is the Memory overlay scheme at bootup of Amiga, where the RAM starting at address 0, is disabled and instead a piece of ROM is mapped instead at this range of memory.

However, when we previous played with the Minimig, we barely touched on all the technicalities when reading from memory via the Minimig. In this post we will explore some of these technicalities.

DACT Battles

With discussions we had from previous posts, we know that in order for a 68k Motorola processor to access memory, two signals are of importance: ASn and DACT. The CPU asserts the signal ASn to signal memory it wants to access memory. When the memory is ready with the data it signals the CPU in return by asserting the DACT signal.

While playing with the Minimig core, I discovered something that puzzled me a bit from the following waveform:


In this waveform, the Amiga core is still in overlay mode, meaning that the address 0 the Amiga would translate to the address 3c0000 hex. The Amiga core, however, asserts the DACT signal, e.g. second signal in the waveform before the address changes to 3c0000. This means that potentially the data will be be read from the incorrect address and returned to the CPU.

I was wondering how the Minimig core on the MisTer project deals with this scenario. Delving a bit into the source code, I found the following snippet in the file rtl/cpu_wrapper.v:


...
fx68k cpu_inst_o
(
...
	.DTACKn(ramsel ? ~ramready : chip_dtack),
...
);
...
Here we see that for RAM accesses we don't use the dact signal of our Minimig core, but rather the ramready signal.

So, what does the ramready signal entails? The answer lies in the following snippet of code within the file rtl/sdram_ctrl.v:

...
cpu_cache_new cpu_cache
(
	.clk              (sysclk),                // clock
	.rst              (!reset || !cache_rst),  // cache reset
	.cpu_cache_ctrl   (cpu_cache_ctrl),        // CPU cache control
	.cache_inhibit    (cache_inhibit),         // cache inhibit
	.cpu_cs           (ramsel),                // cpu activity
	.cpu_adr          (cpuAddr),               // cpu address
	.cpu_bs           ({!cpuU, !cpuL}),        // cpu byte selects
	.cpu_we           (cpustate == 3),         // cpu write
	.cpu_ir           (cpustate == 0),         // cpu instruction read
	.cpu_dr           (cpustate == 2),         // cpu data read
	.cpu_dat_w        (cpuWR),                 // cpu write data
	.cpu_dat_r        (cpuRD),                 // cpu read data
	.cpu_ack          (cache_rd_ack),          // cpu acknowledge
	.wb_en            (cache_wr_ack),          // write enable
	.sdr_dat_r        (sdata_reg),             // sdram read data
	.sdr_read_req     (cache_req),             // sdram read request from cache
	.sdr_read_ack     (cache_fill),            // sdram read acknowledge to cache
	.snoop_act        (chipWE),                // snoop act (write only - just update existing data in cache)
	.snoop_adr        (chipAddr),              // snoop address
	.snoop_dat_w      (chipWR),                // snoop write data
	.snoop_bs         ({!chipU, !chipL})       // snoop byte selects
);
...
assign ramready = cache_rd_ack || write_ena;
...
Here we see that the Amiga core used in the MisTer project, doesn't read directly from SDRAM, but rather via an cache. Looking at the implementation of cpu_cache_new, we see that it is quite an advanced cache, with the same kind of functionality you would find with a cache dedicated on modern day CPU's. This cache will even snoop data writes peripheral chips have done to memory via DMA, so that the CPU will not miss out on these updates.

Overall, this cache is 8KB in size. Overall, I am not so sure if I will be using a cache in my implementation as well. The MisTer project is meant for the DE10-Nano, which has over 500KB of block RAM. I am using the Arty A7, which has far less Block RAM, so I am not sure if I would be able to compete with the same capabilities than what the DE10-Nano has.

So, for now I will work with an implementation that doesn't have a cache just to ease the use of BlockRAM.

More clocking

Now, from the previous post you will remember that we are clocking our amiga_mem_core module, with a clock signal called clk_8_2_mhz. This clock signal only triggers once every 10 clock cycles of our mclk of 83.3333MHz, which resolves to a frequency of 8.3333MHz.

Well, it turns out that that our MiniMig design wants a clock signal of more or less 28MHz. So, instead of one clock cycle every tenth clock cycle, we will need to enable more clock cycles for every 10 clock cycles.

We can maybe go about enable every second clock cycle, which will give us 41MHz. Maybe this is a bit too fast. We can maybe opt for every 10 clock cycles, we can only enable 4 of them, giving us a frequency of 33MHz. Think this is the closest we can get to 28MHz by just enabling different clock cycles within every 10 of them.

So, let us fiddle a bit with how clk_8_2_mhz is generated:

...
    BUFGCE BUFGCE_8_2_mhz (
       .O(clk_8_2_mhz),   // 1-bit output: Clock output
       .CE(clk_8_2_enable), // 1-bit input: Clock enable input for I0
       .I(mclk)    // 1-bit input: Primary clock
    );
...
    always @(negedge mclk)
    begin
        clk_8_2_enable <= (edge_count == 7 || edge_count == 5 || edge_count == 3);
    end
...
Having fed our amiga_mem_core module with the correct clock frequency, we need to instantiate a module instance within this module for generating the other clock signals the minimig core requires:

   amiga_clk amiga_clk
        (
          .clk_28(clk),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(reset))
        );
This is also a module I used straight from the minimig project, which is the file rtl/amiga_clk.v.

For completeness sake, let us add the other module instances:

...
   always @(negedge clk)
    begin
      phi <= ~phi;
    end
...
   minimig minimig(     //m68k pins
     .cpu_address(add), // m68k address bus
     .cpu_data(data_in),    // m68k data bus
     .cpudata_in(data_out),  // m68k data in
     ._cpu_ipl(interrupts),    // m68k interrupt request
     ._cpu_as(As),     // m68k address strobe
     ._cpu_uds(Uds),    // m68k upper data strobe
     .button_reset(reset),
     ._cpu_lds(Lds),    // m68k lower data strobe
     .cpu_r_w(read_write),     // m68k read / write
     ._cpu_dtack(data_ack),  // m68k data acknowledge
     ._cpu_reset(/*reset*/),  // m68k reset
     ._cpu_reset_in(reset_cpu_out),//m68k reset in
     .nmi_addr(0),    // m68k NMI address
     //TODO
     //sram pins
     .ram_data(data),    // sram data bus
     .ramdata_in(ram_data_in),  // sram data bus in
     .ram_address(address), // sram address bus
     ._ram_bhe(),    // sram upper byte select
     ._ram_ble(),    // sram lower byte select
     ._ram_we(write),     // sram write enable
     ._ram_oe(oe),     // sram output enable
     .chip48(),      // big chipram read
 
     //system    pins
     .rst_ext(),     // reset from ctrl block
     .rst_out(),     // minimig reset status
     .clk(clk),         // 28.37516 MHz clock
     .clk7_en(clk7_en),     // 7MHz clock enable
     .clk7n_en(clk7n_en),    // 7MHz negedge clock enable
     .c1(c1),          // clock enable signal
     .c3(c3),          // clock enable signal
     .cck(cck),         // colour clock enable
     .eclk(eclk),        // ECLK enable (1/10th of CLK)
 
     //rs232 pins
     .rxd(),         // rs232 receive
     .txd(),         // rs232 send
     .cts(),         // rs232 clear to send
     .rts(),         // rs232 request to send
     .dtr(),         // rs232 Data Terminal Ready
     .dsr(),         // rs232 Data Set Ready
     .cd(),          // rs232 Carrier Detect
     .ri(),          // rs232 Ring Indicator
 
 
     //host controller interface (SPI)
     .IO_UIO(),
     .IO_FPGA(),
     .IO_STROBE(),
     .IO_WAIT(),
     .IO_DIN(),
     .IO_DOUT()
 

 
     //user i/o
     //output  [1:0] cpucfg,
     //output  [2:0] cachecfg,
     //output  [6:0] memcfg,
     //output        bootrom,     // enable bootrom magic in gary.v
);
...
   fx68k fx68k(        .clk(clk),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // input logic HALTn = 1'b1,            // Not all tools support default port values
        
        // These two signals don't need to be registered. They are not async reset.
        .extReset(reset),            // External sync reset on emulated system
        .pwrUp(reset),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .oRESETn(reset_cpu_out),
        //output eRWn, output ASn, output LDSn, output UDSn,
        //output logic E, output VMAn,    
        //output FC0, output FC1, output FC2,
        //output BGn,
        //output oRESETn, output oHALTEDn,
        .ASn(As), 
        .LDSn(Lds), 
        .UDSn(Uds),
        .DTACKn(data_ack), 
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(interrupts[0]), 
        .IPL1n(interrupts[1]), 
        .IPL2n(interrupts[2]),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
);
...
I have described the use of both modules in a previous post. You will also noticed that the DTACKn still uses the data_ack signal blindly from minimig, of which I have warned against in the previous section. We will give attention to this in the next section.

Synchronised Memory Access

As mentioned earlier, one is not really guaranteed when the minimig core asserts the dact signal, that the data will be available for the CPU. So, one needs to delay the dact signal somehow until the data is ready.

One signal we can use for this is is the _ram_oe signal from the minimig core. Once this signal is asserted, we can be sure that the address asserted is correct and we can fetch the correct data. Obviously we will only assert the DACT signal when the data is really ready.

We will implement all this logic with the following state machine:

    always @(posedge clk)
    begin
       case(dact_state)
         STATE_IDLE: begin
                       dact_state <= (!oe && !data_ack) ? STATE_OE : STATE_IDLE;
                     end
         STATE_OE: begin
                       dact_state <= STATE_OE_1;
                     end
         STATE_OE_1: begin
                       dact_state <= STATE_DACT;
                     end
         STATE_DACT: begin
                       if (data_ack)
                       begin
                           dact_state <= STATE_IDLE;
                       end
                     end

       endcase
    end
We transition from the IDLE state to the next state when both oe and data_ack is asserted, just to ensure we act on a CPU memory access and not from a peripheral DMA access.

For the purpose of just testing, I have added two states to simulate a two cycle memory access time. When we will eventually use our real memory controller, more clock cycles will apply.

We are now ready to supply our CPU with a true DACT signal:

   fx68k fx68k(        .clk(clk),
...
      .DTACKn(dact_state == STATE_DACT ? data_ack : 1), 
...
);
What remains to be done is to link up the memory. For now a will just simulate hardcoded values for memory, that will be a program and see if our CPU will act accordingly:

...
   always @(posedge clk)
   begin
       oe_delayed <= oe;
   end
...   
   assign trigger_read = oe_delayed && !oe;
...   
    always @(negedge clk_8_2_mhz)
        begin
            if (amiga_test_address == 22'h3c0000 && trigger_read)
            begin
              data_in_amiga_test <= 16'hc0;
            end else if (amiga_test_address == 22'h3c0001 && trigger_read)
            begin
              data_in_amiga_test <= 16'h33c3;
            end else if (amiga_test_address == 22'h3c0002 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0;
            end else if (amiga_test_address == 22'h3c0003 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0008;
            end else if (amiga_test_address == 22'h3c0004 && trigger_read)
            begin
              data_in_amiga_test <= 16'h303c; //load immediate
            end else if (amiga_test_address == 22'h3c0005 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0505;
            end else if (amiga_test_address == 22'h3c0006 && trigger_read)
            begin
              data_in_amiga_test <= 16'h33c0; //store
            end else if (amiga_test_address == 22'h3c0007 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0085;
            end else if (amiga_test_address == 22'h3c0008 && trigger_read)
            begin
              data_in_amiga_test <= 16'h8586;
            end else if(amiga_test_address == 22'h3c0009 && trigger_read)
            begin
              data_in_amiga_test <= 16'h4eb9;
            end else if (amiga_test_address == 22'h3c000a && trigger_read)
            begin
              data_in_amiga_test <= 9;
            end else if (amiga_test_address == 22'h3c000b && trigger_read)
            begin
              data_in_amiga_test <= 16'h3e86;
            end
            else if (trigger_read) begin
              data_in_amiga_test <= 16'h33c0;
            end
        end
...
The data only gets assigned once the oe signal transitions from 1 to a 0. As we know, when the 68k processor starts to execute it starts by loading the vectors at address 0, which the minimig core translates to 3c0000.

As you might remember, the starting address that indicates to the 68k where to starts executing is indicated by the vector starting at byte address 4, or 16 bit-word address 2. From the code above this translates to byte address 8, or word address 4. For this reason the program actually starts at word address 3c0004.

Let us have a look at how the waveform looks like, captured from the real FPGA:


This third line is the address send to RAM for retrieving data. It resolves to the valid address starting with '3c' at the falling edge of cck. The second last row show the resulting data asserted shortly thereafter.

In Summary

In this post I added some more logic to our amiga_mem_core to resemble more an Amiga.

We also tested to see if we our 68k core could reliably fetch data and execute code.

In the next post we will try and link up our Amiga controller to our SDRAM memory controller.

Until next time!