Foreword
In the previous post we created a very elementary memory tester for the Arty A7 board to see if we more or less got the logic correct for writing and reading to memory.
When I wrote the memory tester, I have added quite a bit of padding between DDR commands, just to avoid violating some DDR timing parameters. The purpose of this exercise was just to get the memory tester working, and not to worry at that point in time of getting the most efficient time possible. The old saying of when eating an elephant, you do so one bite at a time. 😀
In this post we will revisit timings for our elementary memory tester, and see where we can remove any wasted clock cycles. The ultimate goal is to be able to access memory at a rate of at least 7MHz, which will be sufficient to emulate an Amiga core.
Reducing initial latency
Let us see if we can reduce initial latency. That is the latency from the moment the Memory Tester asserts a command, until the time when DDR RAM receives the first command for fulfilling this request. This is all illustrated with the following hand drawn diagram:
In this diagram every division indicated has a period of 1.5ns and we show a couple of clock signals. Let us start by having a look at the frequencies of these clock signals:
- Memtester has a frequency of 20MHz, and I am not showing a complete cycle of it.
- Oserdes Out has a period of 2 divisions = 3ns. This equals a frequency of 333MHz. This is the signal driving the commands out to the DDR RAM.
- Oserdes Load has a period of 8 divisions = 12ns. This equals a frequency of 83MHz. This clock signal is used to load the OSERDES block with 4 bits worth of data at a time.
- As you can see from the diagram, Mclk has exactly the same frequency as Oserdes Load, but is shifted 45 degrees.
... assign result_cmd = (state == WAIT_CMD && cmd_valid && !refresh_out) ? {1'b0, 8'b0, cmd_address[15:10], 1'b0, 16'h21fd} : test_cmd; ... phy_cmd #( ... ) phy_cmd_i ( ... .phy_cmd_word (result_cmd), ... ); ...So, if we are in the state WAIT_CMD, the mem tester indicates the command is valid, and it is not a refresh command, we don't use test_cmd, but build up a row activate command on the fly. With this setup the oserdes components will load on the first oserdes load-clock edge following the mem tester clock edge, and will basically start outputting commands to DDR RAM at point C in the diagram. In effect we have shaved off a full oserdes load-clock cycle of latency.
- Oserdes data load: From 45 degrees to 90 degrees.
- Serial data out: From 180 degrees to 0 degrees.
- Mclk: From 90 degrees to 135 degrees.
Changing command slots
input [1:0] cmd_slot,This indicates at which slot number the given command should be triggered. Let us look at the write enable signal as an example on how to use this input port:
// we cmda_single #( .IODELAY_GRP(IODELAY_GRP), .IOSTANDARD(IOSTANDARD), .SLEW(SLEW), .REFCLK_FREQUENCY(REFCLK_FREQUENCY), .HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE) ) cmda_we_i ( .dq(ddr3_we), .clk(clk), .clk_div(clk_div), .rst(rst), .dly_data(dly_data_r[7:0]), .din({cmd_slot == 0 ? {1'b1, 1'b1, 1'b1, in_we_r[1]} : cmd_slot == 1 ? {1'b1, 1'b1, in_we_r[1], 1'b1} : cmd_slot == 2 ? {1'b1, in_we_r[1], 1'b1, 1'b1} : {in_we_r[1], 1'b1, 1'b1, 1'b1}}), .tin(in_tri_r), .set_delay(set_r), .ld_delay(ld_dly_cmd[3]));As can be seen here at port din we place the signal in a different position for each value of cmd_slot.
generate
genvar i;
for (i=0; i<ADDRESS_NUMBER; i=i+1) begin: addr_block
// assign decode_addr[i]=(ld_dly_addr[4:0] == i)?1'b1:1'b0;
cmda_single #(
.IODELAY_GRP(IODELAY_GRP),
.IOSTANDARD(IOSTANDARD),
.SLEW(SLEW),
.REFCLK_FREQUENCY(REFCLK_FREQUENCY),
.HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE)
) cmda_addr_i (
.dq(ddr3_a[i]), // I/O pad (appears on the output 1/2 clk_div earlier, than DDR data)
.clk(clk), // free-running system clock, same frequency as iclk (shared for R/W)
.clk_div(clk_div), // free-running half clk frequency, front aligned to clk (shared for R/W)
.rst(rst),
.dly_data(dly_data_r[7:0]), // delay value (3 LSB - fine delay)
.din({4{ in_a_r[ADDRESS_NUMBER+i]}}), // parallel data to be sent out
// .tin(in_tri_r[1:0]), // tristate for data out (sent out earlier than data!)
.tin(in_tri_r), // tristate for data out (sent out earlier than data!)
.set_delay(set_r), // clk_div synchronous load odelay value from dly_data
.ld_delay(ld_dly_addr[i]) // clk_div synchronous set odealy value from loaded
);
end
endgenerate
Putting everything together
... PREPARE_CMD: begin test_cmd <= 32'h000001ff; cmd_slot <= 0; state <= WAIT_CMD; ... end WAIT_CMD: begin if (cmd_valid) begin if (refresh_out) begin state <= REFRESH_1; cmd_status <= 2; end else begin state <= STATE_PREA; test_cmd <= {1'b0, 4'b0, cmd_address[9:0], 1'b0, 4'h1, (write_out ? 2'b11 : 2'b00), 10'h1fd}; cmd_slot <= 1; data_in <= cmd_data_out; do_write <= write_out; cmd_status <= 2; end end end ...In PREPARE_CMD we ensure that all commands will be issued at the first timeslot.
STATE_PREA: begin state <= WAIT_WRITE_RECOVERY; dq_tri <= do_write ? 0 : 15; cmd_slot <= 0; test_cmd <= do_write ? 32'h000005ff : 32'h000001ff; end WAIT_WRITE_RECOVERY: begin test_cmd <= 32'h000001ff; state <= PRECHARGE_AFTER_WRITE; end PRECHARGE_AFTER_WRITE: begin do_capture <= 1; state <= POST_READ_1; cmd_slot <= 3; test_cmd <= 32'h000029fd; end POST_READ_1: begin state <= PREPARE_CMD; test_cmd <= 32'h000001ff; endLet us quickly go through this code. In STATE_PREA we wait for the read/write cycle to complete, and keep the ODT signal asserted during this time if it is a write.
Test Results
- Point A is the clock signal for our Memory Tester. Originally this frequency was 20MHz, but during experimentation, I have found that 20MHz is a tight fit. I have lowered the frequency to 16.7MHz instead. Here the Memory Tester issues the command at the first rising edge and data is available at the following rising edge.
- At point B we issue a row Activate command.
- At point C we issue a column read command.
- At point D we issue the precharge command.
- Point E indicates the point when we receive data from the data_out port and when this value is captured by the cap_value port.
No comments:
Post a Comment