Foreword
In the previous post we played a bit with read levelling on the Arty A7.
I also came up with the theory that you don't really need to do write or read levelling on the Arty A7. This board only contains a single RAM chip and is fairly close to the FPGA, so from the looks of it one would not need to account for variable clock skew.
If this theory would proof to be correct throughout this series, our design will remain relatively simple.
In this post we attempt to write a value to the RAM of the Arty, and see if we can read the same value back.
If we can achieve this step, it will be a huge step in creating our custom memory controller.
As with my previous couple of posts, I will be building on the 10393 Elphel Memory controller, where the source code is available here.
Overview of RAM commands for reading and writing
Let us start off by looking at the commands to do a read and a write to RAM.
Before we do anything, it is important that the row gets activated on which we want to work. This is done by bringing the RAS signal low and specifying the row address. Before issuing the next command, one needs to wait for a couple of clock cycles. On the Arty A7, this waiting period is 5 clock cycles.
With the row activated, reading and writing commands can now be issues to that row.
To do a write, the CAS signal needs to be brought down, as well as the WE signal. The column address to which we want to write also needs to be supplied at the same time.
Once the write command is supplied to the RAM, the data needs to be provided on the DQ pins. However, the data cannot be provided straight away and one needs to wait for 5 clock cycles before providing it.
In our case, after we have performed write, we want to see if we can read the same data back. For a read command the CAS needs to be pulled down once again, but we leave WE high. We also need to specify the same address as we did previously for the write.
After waiting again for 5 clock cycles, the data will be available on the DQ pins.
Just a note about the data written or read to and from the DQ pins. During a read or write operation, eight 16-bit words is transferred at a time. These eight words is transferred over a period of 4 clock cycles, transferring a word at both rising and falling clock transitions.
ISERDESE2 and OSERDES2 revisited
As you might have guest, we will be using ISERDESE2 and OSERDESE2 components for capturing/writing the eight words on the DQ pins.
From the previous post we have seen that these components are serial-parallel/parallel-serial converters, allowing data on DQ pins to work at a speed of 333MHz, while allowing the rest of the FPGA to operate at a more convenient 83MHz.
Since my last post, I discovered a caveat with ISERDESE2 and OSERDESE2 components. With ODERDESE2 you can indeed specify 8 bits at a time, which makes our 83MHz/333MHz sum work out.
However, I discovered that ISERDESE2 can only accept 4 bits data from the DQ pins at a time, so suddenly our 83MHz/333Mhz sum doesn't work out anymore. We would need to clock the ISERDESE2 at 167MHz to ensure we offload the captured bits in time each time.
I don't feel comfortable running everything in the FPGA at 167MHz, so I will add my shift register in the design that will shift in 4bits at a time and output eight bits. The shift register will therefore operate at 167MHz, and the rest of the design can operate at 83MHz.
The Command State Machine
Up to now I have been making use of a state machine for issuing the series of commands for initialising the RAM, and doing read and write levelling. I will be using the same state machine in this post for testing the read and write operation.
I haven't given much detail of this state machine, so I will do so in this section.
I have implemented the state machine in the file memctrl/phy/mcontr_sequencer.v of the Elphel 10393 memory controller.
Each command issued by the state machine is basically a 32 bit number, where the layout of the important bits are as follows:
- bits 30-17: address
- bits 16-14: bank
- bit 13: RAS
- bit 12: CAS
- bit 11: Write Enable
- bit 10: ODT
- bit 9: CKE
always @(posedge mclk) begin if (start_init) begin case (state) ... ... initialise RAM ... STATE_ACTIVATE: begin state <= STATE_WAIT_ACTIVATE; //Do activate test_cmd <= 32'h000021fd; dq_tri = 0; data_in <= 128'h112233005566778899aabbccddeeff44; end STATE_WAIT_ACTIVATE: begin test_cmd <= 32'h000001ff; state <= STATE_WAIT_STATE_1; end endA couple of things is happening here. First we send the command for doing an activate, which is represented by the value 32'h000021fd. A command should only be signalled for one clock cycle, after which the RAS, CAS and WE signals should be de-asserted. It is for this reason we set the command to 32'h000001ff at the next clock cycle.
Once we have set data_in with this value, the OSERDESE2 components for the DQ pins, will continuously output these 8 values.
READ_TEST: begin state <= WRITE_DELAY_0; //Column write test_cmd <= 32'h00081dfd; end WRITE_DELAY_0: begin test_cmd <= 32'h000005ff; state <= WRITE_DELAY_1; end ... ...Wait 5 clock cycles ... DO_READ: begin state <= PAUSE_AFTER_READ; //Column read dq_tri <= 15; test_cmd <= 32'h000011fd; end PAUSE_AFTER_READ: begin state <= PAUSE_AFTER_READ; test_cmd <= 32'h000001ff; endYou will notice that after issuing the WRITE command, we are issuing the value 32'h000005ff instead of 32'h000001ff. The reason for this is because the ODT bit needs to be asserted during a write.
Latching read data
... parameter IOBDELAY = "IBUF", ... ISERDESE2 #( .DATA_RATE ("DDR"), .DATA_WIDTH (4), .DYN_CLKDIV_INV_EN (DYN_CLKDIV_INV_EN), .DYN_CLK_INV_EN ("FALSE"), .INIT_Q1 (1'b0), .INIT_Q2 (1'b0), .INIT_Q3 (1'b0), .INIT_Q4 (1'b0), .INTERFACE_TYPE ("MEMORY"), .NUM_CE (1), .IOBDELAY (IOBDELAY), .OFB_USED ("FALSE"), .SERDES_MODE ("MASTER"), .SRVAL_Q1 (1'b0), .SRVAL_Q2 (1'b0), .SRVAL_Q3 (1'b0), .SRVAL_Q4 (1'b0) ) iserdes_i ( .O (comb_out), .Q1 (iserdes_out[3]), .Q2 (iserdes_out[2]), .Q3 (iserdes_out[1]), .Q4 (iserdes_out[0]), .SHIFTOUT1 (), .SHIFTOUT2 (), .BITSLIP (1'b0), .CE1 (1'b1), .CE2 (1'b1), .CLK (iclk), .CLKB (!iclk), .CLKDIVP (), // used with phasers, source-sync .CLKDIV (oclk_div), .DDLY (ddly), .D (d_direct), // direct connection to IOB bypassing idelay .DYNCLKDIVSEL (inv_clk_div), .DYNCLKSEL (1'b0), .OCLK (oclk), .OCLKB (!oclk), .OFB (), .RST (rst), .SHIFTIN1 (1'b0), .SHIFTIN2 (1'b0) );One particular change here, is that I am using the value "IBUF" for the parameter IOBDELAY. This is because I want to use the direct input, e.g. D, instead of the delayed version, DDLY. We are using our own fixed delays.
... output [7:0] dout, ... always @(negedge oclk_div) begin dout_le <= {dout_le[3:0], iserdes_out}; end ...oclk_div is a 167MHz clock signal. With this snippet of code we will get 8 bits of data every second 167MHz clock cycle.
... ... Some delay ... POST_READ: begin do_capture <= 1; state <= POST_READ_1; end POST_READ_1: begin do_capture <= 0; state <= POST_READ_6; end ...Directly after we have set do_capture to 1, we set it 0, so that the captured value is not overwritten. I have determine the right moment to set do_capture by experimentation in the Verilog simulator.
always @(negedge mclk) begin if (do_capture) begin cap_value <= data_out; end end
Test Results
- xxxxxxxx 5577xxxx
- xxxxxxxx 6688xxxx
// we cmda_single #( .IODELAY_GRP(IODELAY_GRP), .IOSTANDARD(IOSTANDARD), .SLEW(SLEW), .REFCLK_FREQUENCY(REFCLK_FREQUENCY), .HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE) ) cmda_we_i ( .dq(ddr3_we), .clk(clk), .clk_div(clk_div), .rst(rst), .dly_data(dly_data_r[7:0]), .din({1'b1, in_we_r[0], in_we_r[1], 1'b1}), .tin(in_tri_r), .set_delay(set_r), .ld_delay(ld_dly_cmd[3]));In my version of the file, din accepts 4 bits, whereas the original version only accepts two bits.
- .din({in_we_r[0], in_we_r[1], 1'b1, 1'b1})