Monday 23 October 2023

Adding more functionality to the second channel of the Memory controller

Foreword

In the previous post we started modifying our existing memory controller to become a dual channel memory controller.

A dual core memory controller would allow us to have two cores accessing memory at both 7MHz, by allocating a different bank of memory within the DDR3 memory for each core.

In the previous post we basically got the timings right to trigger the DDR3 commands of the cores in an interleaved way.

In this post we are going to extend this functionality further and add a core to issue some dummy read/write commands on the second memory channel and see if we can read some sensible data back from DDR RAM via the second memory channel.

Using sensible addresses

In the previous post we didn't really worry about using sensible row/column addresses for the second channel of our memory controller and we just used the same hardcoded address for both the row and the column.

So, let us start this post by seeing if we can create some sensible row and column addresses. Firstly, we will create a block of code for driving our second memory channel:

amiga_mem_core amiga_mem_core(.clk(clk_8_2_mhz),
    .address(channel_address_2),
    .data(channel_data_2),
    .data_in(cap_value_2),
    .write(write_channel_2),
    .reset(reset_retro)
);
amiga_mem_core is our hypothetical Amiga core that will use the second memory channel for its memory needs. We will gradually develop this core in coming sections and future posts.

Let us quickly discuss the different ports of amiga_mem_core:
  • clk_8_2_mhz: This is basically the same kind of clock as what drives our main 6502 core. This is the 83.333Mhz clock, but we only present every tenth clock pulse, which gives us an effective clock of 8.333Mhz. I would like to point out here that we will use a different clock pulse from 10 available than we use for our 6502 core, because the second memory channel require the address to be asserted at a different time than the first memory channel.
  • channel_address_2: a 16 bit linear address, giving 64k address space. We will slice and dice this address to get row address and column address
  • cap_value_2: 16 bit captured data from DDR3 RAM. As we know from previous posts, the ISERDES captures this data from DDR3 RAM, but throws it away after the next 83MHz. So, we need to capture this data so it is still available at the next 8.33MHz clock pulse.
  • write_channel_2: The Amiga core indicates whether it wants to either write (e.g. set to 1), or read (e.g. set to 0).
Let us modify our memory controller state machine a bit to use the values from these ports:

              WAIT_READ_WRITE_2: begin
                  test_cmd <= 32'h000001ff;
                  phy_rcw_pos_2 <= 3;
                  phy_address_2 <= {9'b0,channel_address_2[15:10]};
				  state <= PRECHARGE_AFTER_WRITE;
              end
			  
	      PRECHARGE_AFTER_WRITE: begin
                  // CAS command
                  phy_rcw_pos_2 <= {2'b10, write_channel_2};
                  phy_address_2 <= {5'b0,channel_address_2[9:3], map_address_2[2:0]};
                  data_in <= {8{channel_data_2}};
                  dq_tri = write_channel_2 ? 15 : 0;
                  mem_channel <= 1;
                  state <= POST_READ_1;
                  cmd_slot <= 3;
                  test_cmd <= write_channel_2 ? 32'h000029fd : 32'h00002dfd;
              end
If you have a look at my previous post, you will see I have also modified the above two selectors of the state machine to open a row for the second memory channel and then do a column read/write in the second selector. In this case I have added some more logic to use the address of our Hypothetical Amiga core.

Note that as with our first channel we form the row address by using bits 10 upwards from our Amiga core, and the lower ten bits of the Amiga core address.

You will notice I am not using the lower three bits as is for the column address, but rather make use of a map. I have used the same technique in the first channel of our memory controller. Let us quickly recap on the reason for this.

As you might remember from previous posts, DDR3 memory will never just you the single 16 bit- word you are looking for, but will always return you a burst of 4 or 8 words. To catch the data in the correct chunk within the 8 word burst, is quite challenge and you need to fiddle quite a bit the code to get it right.

So, I just take the lazy route and just see what word arrives for each address 0f 0-7 and then just created a map to get the correct word within the burst. My mapping function looks like this:

    always @*
    begin
        if (channel_address_2[2:0] == 0)
        begin
            map_address_2 = 7;
        end else if (channel_address_2[2:0] == 1)
        begin
            map_address_2 = 0;
        end else if (channel_address_2[2:0] == 2)
        begin
            map_address_2 = 1;
        end else if (channel_address_2[2:0] == 3)
        begin
            map_address_2 = 2;
        end else if (channel_address_2[2:0] == 4)
        begin
            map_address_2 = 3;
        end else if (channel_address_2[2:0] == 5)
        begin
            map_address_2 = 4;
        end else if (channel_address_2[2:0] == 6)
        begin
            map_address_2 = 5;
        end else
        begin
            map_address_2 = 6;
        end
    end
Also, there is a different mapping function for both the simulation environment and when running on the actual FPGA. I never managed to find the reason why there is a difference between the two, but for now I am just using two different mapping functions for the two environments.

Moving onto the data_in assignment. Here I am just repeating the data I want to write for the full burst, until the write is complete. It is important in this case just to ensure we assert the Data mask bit it the correct time instant to ensure the correct word is written in a 8-word column. So, I am just doing another mapping function:

    always @*
    begin
        if (cmd_offset[2:0] == 0) 
        begin
            dm_slot = ~1;
        end else if (cmd_offset[2:0] == 1)
        begin
            dm_slot = ~2;
        end else if (cmd_offset[2:0] == 2)
        begin
            dm_slot = ~4;
        end else if (cmd_offset[2:0] == 3)
        begin
            dm_slot = ~8;
        end else if (cmd_offset[2:0] == 4)
        begin
            dm_slot = ~16;
        end else if (cmd_offset[2:0] == 5)
        begin
            dm_slot = ~32;
        end else if (cmd_offset[2:0] == 6)
        begin
            dm_slot = ~64;
        end else if (cmd_offset[2:0] == 7)
        begin
            dm_slot = ~128;
        end
    end
The wire cmd_offset is used for both channels, so it is important we have a selector like this:

    assign cmd_offset = mem_channel == 0 ? cmd_address[2:0] : channel_address_2[2:0];

Implementing the Hypothetical Amiga core

Let us implement the Hypothetical Amiga core we had been talking about in this post. This is basically the core where we will do some writes using the second memory channel and see if we can read the same data back. In future posts we will gradually evolve this core to a fully functional Amiga core.

This core will basically be a 6 bit counter, where we use the top bit to indicate read/write, low indicating write. So, starting the top bit as zero, we will start doing a bunch of writes, and when the counter comes to the point where bit 5 (e.g. top bit) is set, we will do a series of reads.

The resulting core is fairly simple:

module amiga_mem_core(
    input wire clk,
    output wire [15:0] address,
    output wire write,
    input wire reset,
    output wire [15:0] data,
    input wire [15:0] data_in
    );
    
   (* mark_debug = "true" *) reg [5:0] counter = 0;
   (* mark_debug = "true" *) reg [15:0] captured_data;
   
   assign address = {11'b0, counter[4:0]};
   assign write = counter[5];
    
   always @(posedge clk)
   begin
       counter <= reset ? 0 : (counter + 1);
   end
   
   always @(posedge clk)
   begin
       captured_data <= data_in;
   end
   
   assign data = counter + 3;
endmodule
I have marked counter and captured_data to be debugged, so we can view those ports via ILA when running on the actual FPGA.

We use the counter also to generate some test data and add three to it does to get some test data that is different from the address.

I mentioned earlier that the data ISERDES capture is only retained for one 83.33Mhz clock cycle, so by the time our Amiga core looks for the data, it will be long time gone. So, we will need to capture it outside the Amiga core and feed it to the Amiga core like this:

    always @(posedge mclk)
    begin
        if (edge_count == 7)
        begin
            cap_value_2 <= {data_out[103:96], data_out[39:32]};
        end
    end
So, we capture the data always at specific 83Mhz when the data is available. data_out is basically the the output of our ISERDES block, that captured 8 bursts of data. Bits 63 - 0 contains the low byte of each of the 8 data bursts, and bits 127 - 64 contains the high byte of each of the 8 bursts. By experimentation I found that the data we need is always at bits 39:32 and bits 103:96.

In Summary

In this post we added some more meat around the second channel of our memory controller, managed to write some test data to the DDR3 RAM and read the same data back.

In the next post we will start to do some more interesting stuff, and see if we can add an Amiga core that uses the second memory channel for memory storage.

Until next time!