Foreword
In the previous post we reduced trailing latency for DDR RAM access, providing us with the desired memory throughput required by a Amiga core.
At first sight it seems that an Amiga core cannot really work with DDR3 memory. An Amiga core works with 16-bits at a time from memory, whereas DDR3 RAM works with bursts of 4 or 8 16-bit bursts at a time. So, in this post we will see if we can find a way to work with DDR3 memory 16-bits at a time.
Writing 16-bits at a time
On our journey to tackling 16-bits at a time, let us have a look at writes.
As mentioned earlier, DDR3 RAM works with either 4 or 8 bursts at a time. Putting an Amiga core into the picture, only one of these 4/8 bursts will always be a valid write, and the remaining bursts memory will be unintentionally overwritten.
Somehow we need to be able to tell DDR3 RAM which of the bursts contain valid Write, and indeed DDR3 memory does.
All DDR3 memory contains an input signal called Data Mask, or abbreviated DM. During a burst session, the DDR3 RAM will examine the DM session at every data burst. If the signal is a 1, e.g. burst masked, the burst will be ignored. If, however, the signal is 0, e.g. unmasked, the burst will be considered valid and the relevant location in memory will be updated with the value.
Therefore, with an Amiga core, with eight bursts there will always be only one timeslot where the DM signal will be 0 and the rest will be ones.
Let us take an example. Suppose we want to write the value 25 to address 13. Burst writes always start at 8 byte boundaries, like addresses 0, 8, 16, 24 and so on. Address 13 falls within the boundary 8 to 15. So, the DM values for these bursts will look as follows:
1 1 1 1 1 0 1 1
Address: 08 09 10 11 12 13 14 15
Concerning the data, we can just repeat the data value 25 8 times, making life easier.
Now, let us write some code. First we need to reduce data_in/data_out ports of mem_tester to 16 bits:
module mem_tester( input clk, // 0 - reset // 1 - ready input [2:0] cmd_status, output reg select = 0, output reg refresh = 0, output reg write, output [15:0] address_out, output wire [/*127*/15:0] data_out, input [/*127:0*/15:0] data_in ); ... endmoduleLet us now move on to the file mcontr_sequencer.v, which contains the state machine that breaks up the commands from mem_tester into DDR3 memory commands. One of the selectors we need to change as follows:
WAIT_CMD: begin if (cmd_valid) begin if (refresh_out) begin state <= REFRESH_1; cmd_status <= 2; end else begin state <= STATE_PREA; test_cmd <= {1'b0, 4'b0, {cmd_address[9:3], map_address[2:0]}, 1'b0, 4'h1, (write_out ? 2'b11 : 2'b00), 10'h1fd}; cmd_slot <= 1; data_in <= {8{cmd_data_out}}; //column_address <= cmd_address[9:0]; do_write <= write_out; cmd_status <= 2; end end endWe basically duplicating the data_out of mem_tester 8 times, which will be fed to the OSERDES module that will repeat the same burst 8 times during a Write.
always @* begin if (cmd_address[2:0] == 0) begin dm_slot = ~1; end else if (cmd_address[2:0] == 1) begin dm_slot = ~2; end else if (cmd_address[2:0] == 2) begin dm_slot = ~4; end else if (cmd_address[2:0] == 3) begin dm_slot = ~8; end else if (cmd_address[2:0] == 4) begin dm_slot = ~16; end else if (cmd_address[2:0] == 5) begin dm_slot = ~32; end else if (cmd_address[2:0] == 6) begin dm_slot = ~64; end else if (cmd_address[2:0] == 7) begin dm_slot = ~128; end endHere dm_slot is the data we need to feed the OSERDES component dealing with the DM output. Once the mem_tester has asserted an address the OSERDES component for DM will output the 8 bit pattern continuously, until the DDR3 RAM is at the phase of receiving data that should be written. During this phase the DDR3 RAM will look for the DM slot which is zero as the queue.
Reading 16 bits at a time
Let us start building this lookup table. Looking at the screenshot of the simulation again as an example of the first value. In this waveform we requested address c, but we got b. We can state this info in another way: To get address b, we need to specify address c, or write it like this:
//Sim mapping
always @*
begin
if (cmd_address[2:0] == 0)
begin
map_address = 1;
end else if (cmd_address[2:0] == 1)
begin
map_address = 6;
end else if (cmd_address[2:0] == 2)
begin
map_address = 7;
end else if (cmd_address[2:0] == 3)
begin
map_address = 4;
end else if (cmd_address[2:0] == 4)
begin
map_address = 5;
end else if (cmd_address[2:0] == 5)
begin
map_address = 2;
end else if (cmd_address[2:0] == 6)
begin
map_address = 3;
end else
begin
map_address = 0;
end
end
Where ever we need to specify the new address, we need to use map_address for the lower bits:WAIT_CMD: begin if (cmd_valid) begin if (refresh_out) begin state <= REFRESH_1; cmd_status <= 2; end else begin state <= STATE_PREA; test_cmd <= {1'b0, 4'b0, {cmd_address[9:3], map_address[2:0]}, 1'b0, 4'h1, (write_out ? 2'b11 : 2'b00), 10'h1fd}; cmd_slot <= 1; data_in <= {8{cmd_data_out}}; //column_address <= cmd_address[9:0]; do_write <= write_out; cmd_status <= 2; end end endLet us now move on, to examine the data when running on the actual FPGA. The following waveform shows some signals captured, while the core was running on the FPGA:
- Address Out
- Clk of mem_tester
- Captured data
- Write/read
- Address 0 -> 5
- Address 1 -> 6
- Address 2 -> 7
- Address 3 -> 0
- Address 4 -> 1
- Address 5 -> 2
always @* begin if (cmd_address[2:0] == 0) begin map_address = 3; end else if (cmd_address[2:0] == 1) begin map_address = 4; end else if (cmd_address[2:0] == 2) begin map_address = 5; end else if (cmd_address[2:0] == 3) begin map_address = 6; end else if (cmd_address[2:0] == 4) begin map_address = 7; end else if (cmd_address[2:0] == 5) begin map_address = 0; end else if (cmd_address[2:0] == 6) begin map_address = 1; end else begin map_address = 2; end end