Friday 23 December 2022

SD Card Access for a Arty A7: Part 4

Foreword

In the previous post we managed to issue an IDLE command to SD Card via an SD Card reader, attached to the Arty A7 board. We also confirmed that SD Card send a response back for the command.

Up to this point in time we made use of a state machine for issuing command sequences to the Gisselquist SD core. There is quite a number of commands one needs to issue to an SD Card, in order to do reading/writing of data stored on the SD Card. Using a state machine for this exercise can become quite unpleasant in the long run.

Thus, in this post we will look at using a CPU core, on which we can run a stored program for issuing the SD Card command sequences. The CPU core I will using for this purpose will be Arlet Ottens' 6502 core. 

I am sure there will be very frowns out there on using an 8-bit CPU, working with a 32-bit Wishbone device like the Gisselquist SD Card core. However, the 6502 is fairly light on FPGA resources and I think it is worthwhile to see how far this core can help us out.

The Memory Map

Let us have a look at the memory map for our 6502 system:

  • FFFF - FF00: ROM. For starters we will have a 256 byte ROM, but might grow beyond this size over time. As with many 6502 systems, the startup ROM needs to live in the top part of RAM, because the reset vector is at addresses FFFC-FFFD.
  • FEFF - FE00: Interface to the registers of Gisselquist SD Card Core. As we have seen in the previous post, we have access to 2 32-bit registers via the Wishbone bus of the Gisselquist core.
Let us zoom a bit into the Interface to the Gisselquist core. The wishbone interface works with 32 bits of data, whereas the 6502 works with 8 bits of data at a time. How does one deal with these differences in data widths?

To explain a possible solution to the problem, let us start by arranging the memory locations starting at FE00 like this:

So, the addresses FE00-FE003 maps to register 0 of Gisselquist core, and the addresses FE04-FE07 maps to Register 1 of the Gisselquist core.

Now, read/writes to the lowest byte of each register (e.g. marked in red), will trigger transactions on the wishbone bus. The byte addresses in black, maps to temporary registers.

Suppose we want our 6502 to write a value to Register 1. We will start by writing the top three bytes of the 32 bit word to memory locations FE07, FE06 and FE05. Writing to these addresses will set the values of temporary registers and will not trigger any wishbone transaction. Witing to FE04, however, will a wishbone write transaction.

With this wishbone write transaction, we will concatenate the values stored in temporary registers FE07, FE06 and FE05, together with the value currently been written by the 6502 to address FE04.

I wishbone read works in a very similar way, triggered by reading either FE04 and FE00. The top three bytes returned by the wishbone read will be stored in another set of temporary registers, which afterwards can also be read by the 6502 at addresses FE07/FE06/FE05 or FE03/FE02/FE01. 

I will give more detail on implementing this in coming sections.

Wiring up the 6502

Let us start Wiring up the 6502 core.

For starters, we need a ROM for feeding the 6502 with a program to execute. For this we go on a trip in memory lane, where in 2017 we created a ROM module for our C64 core, here. You can find the full source for this module with this link on Github: https://github.com/ovalcode/c64fpga/blob/master/ip/bblock/src/rom.v 

An instance of the ROM module looks like this:

   rom#(
    .ADDR_WIDTH(8),
    .ROM_FILE("romsdspi.bin")
)   rom (
      .clk(gen_clk),
      .addr(cpu_address[7:0]),
      .rom_out(rom_out)
    );
As mentioned earlier, we will start off with only a 256 byte ROM. For this reason ADD_WIDTH is set to 8. We also only use the lower 8 bits of the address from the CPU.

The parameter, .ROM_FILE, is the path to a file on the file system containing a ROM image. This is a text file, one byte per line and in Hex. So, our 256 byte ROM, will result in a file containing 256 lines. As mentioned previously, the rest vector is at address FFFC-FFFD, so the last four lines of our ROM file will look like this:

00
FF
00
00
Here we see the 6502 will start executing at address FF00, the beginning of the last 256 page in the 64K address space. We will cover the assembly code a bit later.

Let us now have a quick look at an instance of the Arlet Ottens core:

cpu cpu( .clk(gen_clk), .reset(...), .AB(cpu_address), .DI(rom_out), .DO(cpu_data_out), .WE(we_6502), .IRQ(0), .NMI(0), .RDY(1) );

Writing from 6502 to SDSPI Core

Let us now focus on the functionality for writing from the 6502 core to a SDSPI Core register.

Firstly, because the 6502 can only deal with 8-bits at a time, we need to add 3 temp registers so we can have 32-bits available that the SDSPI require for a write:

always @(posedge gen_clk)
begin
  if (we_6502)
  begin
    if (cpu_address == 16'hfe01)
    begin
        reg_1 <= cpu_data_out;
    end else if (cpu_address == 16'hfe02)
    begin
        reg_2 <= cpu_data_out;
    end else if (cpu_address == 16'hfe03)
    begin
        reg_3 <= cpu_data_out;
    end
  end
end
Next, let us generate a strobe signal for the SDSPI Core:

always @(posedge gen_clk)
assign on_word_boundary = cpu_address[1:0] == 0;

assign wb_stb = cpu_address[15:8] == 8'hfe && on_word_boundary;
So, we only strobe on a word boundary, e.g. addresses like FE00 and FE04. The applicable signals on the SDSPI core looks like this:

sdspi  sdspi (
...
		// Wishbone interface
		// {{{
		.i_wb_cyc(1), .i_wb_stb(wb_stb), .i_wb_we(we_6502),
		.i_wb_addr({1'b0,cpu_address[2]}),
		.i_wb_data({reg_3, reg_2, reg_1, cpu_data_out}),
...
	);

Reading with the 6502

Now, let us look into reading with the 6502. Reading is a bit more complex than writing, because we can read from potentially two sources: ROM and registers of the SDSPI core.

To cater for the two possible read sources, let us create the following outline:

...
cpu cpu( ... .DO(cpu_data_out), ... );
...
always @(posedge gen_clk)
begin
  addr_delayed <= cpu_address;
end
...
always @*
begin
    casex (addr_delayed)
        ...
        default: combined_data = rom_out;
    endcase 
end
...
The casex is the main part for selecting the correct source. We use a casex instead of a usual case because we use a subset of the bits to decide which source to select. We will add more selectors to our casex in a bit.

One thing you will also notice, is that we are using a delayed version of the address for selection. This is just to cater for the way Block RAMs work, which always has the data ready for given address at the next clock cycle. At the next clock cycle the 6502 core can potentially assert a different address, which can cause data from the wrong source to be selected and presented to the CPU.

Now, let us extend our outline so that we make our 6502 read registers from the SDSPI core:

...
always @(posedge gen_clk)
begin
    wb_data_store <= (wb_stb && !we_6502) ? o_data_sdspi[31:8] : wb_data_store;  
end
...
always @*
begin
    casex (addr_delayed)
        16'b1111_1110_xxxx_xx00: combined_data = o_data_sdspi[7:0];
        16'b1111_1110_xxxx_xx01: combined_data = wb_data_store[7:0];
        16'b1111_1110_xxxx_xx10: combined_data = wb_data_store[15:8];
        16'b1111_1110_xxxx_xx11: combined_data = wb_data_store[23:16];

        default: combined_data = rom_out;
    endcase 
end
...
So, when we read from a SDPSI core regitser we store the top three in temporary register called wb_data_store, which the 6502 can read at later stage if so desired.

At this point we have a small caveat, since the SDSPI Core will not have register data ready at the next clock cycle, but require one additional clock cycle before the data is ready. This behaviour breaks all the assumptions the 6502 core make.

Luckily, the 6502 core does provides an RDY input signal, with which we can effectively pause the 6502 on read for as many clock cycles as we want to, until the data we want is ready in the data bus. With this in mind, we need to change the code above to the following:

...
always @(posedge gen_clk)
begin
    wait_read <= wait_read ? 0 : (wb_stb && !we_6502);
end

always @(posedge gen_clk)
begin
    capture_data <= wait_read;
end

always @(posedge gen_clk)
begin
    wb_data_store <= capture_data ? o_data_sdspi[31:8] : wb_data_store;  
end

cpu cpu(... .RDY(!wait_read) );

always @(posedge gen_clk)
begin
  addr_delayed <= wait_read ? addr_delayed : cpu_address;
end
...
As seen from this code, we also need to wait before we capture a value for wb_data_store, as well as delaying addr_delayed even further is required.

The 6502 Assembly Program

Let us have a look at a 6502 Assembly program for for accessing the SDSPI core, which will ultimately put an SD Card into IDLE mode:

FF00   A9 55     LDA #$55
FF02   8D 01 FE  STA $FE01
FF05   8D 02 FE  STA $FE02
FF08   8D 03 FE  STA $FE03
FF0B   A9 0B     LDA #$0B
FF0D   8D 04 FE  STA $FE04 ; Store the value $5555550B into Data Register
FF10   A9 C0     LDA #$C0  
FF12   8D 00 FE  STA $FE00 ; Init Config registers with value stored in Data Register
FF15   A9 FF     LDA #$FF
FF17   8D 03 FE  STA $FE03
FF1A   8D 02 FE  STA $FE02
FF1D   8D 01 FE  STA $FE01
FF20   8D 04 FE  STA $FE04 ; Load Data register with $FFFFFFFF
FF23   A9 40     LDA #$40
FF25   8D 00 FE  STA $FE00 ; Give Idle command ($40) followed by $FFFFFFFF (e.g. Data Register)
Just to give some context again. Data Register mentioned in the comments is register 1 of the SDSPI core.

The actual command byte is issued via address $FE00. The command byte value $C0 instructs the SDSPI core to load the config registers with values stored in the Data Register. Command byte value $40 instructs the SD Card to go into IDLE mode.

One part I haven't shown in this program is a required endless loop.

The waveforms produced by this Assembly program is the same as in the previous post where we issued an IDLE command by means of a state machine, so I will not present the waveforms in this post.

In Summary

In this post we added added Arlet Otten's 6502 core to our design, so that we can programmatically initialise an SD Card. Doing SD Card initialisation with a state machine will just become too cumbersome on the long run.

In the next post we will try and fully initialise the SDCard and see if we can read a sector of data from the Card.

Until next time!

  

Sunday 4 December 2022

SD Card Access for a Arty A7: Part 3

Foreword

This is the third part in the series where we try to get a SD Card reader to work on an Arty A7 board.

In the previous post we had a look at the FPGA core by Dan Gisselquist that can interface with an SD Card Reader.

A nice feature of Dan Gisselquist's core is the test bench that can simulate responses from an SD Card. Out the box this Test Bench works within the Verilator eco system. However, in the previous post we managed to use Gisselquist's SD Card response module with simulation in Vivado.

We concluded the previous post been able to issue an IDLE command to the SD Card response module, and got a response back.

In this post we will do the same exercise on the physical Arty A7 board, issuing an IDLE command to the SD Card and checking if we can also get a response back from an SD Card.

Attaching the SD Card module to the Arty A7

In the first part of this series I briefly shown a pic of a PMOD SD Card reader still in its packaging.

To be honest, this module was in its packaging up to now 😁 Well, I thought just to share a picture of the Sd Card module attached to the Arty A7 board:


There was one caveat I discovered straight away when inserting the SD Card module, which I didn't thought of before hand: This module occupies some space in front of the PMOD header next to it.

This might pose an issue when we want to use a VGA PMOD module later in the project, which uses both PMOD headers JB and JC.

To get around this issue, we might need to make use of a PMOD extension cord for inserting the SD Card module, freeing some space in front of header JB. But, we will tackle this issue when we get there.

Creating the constraints

We need to create some constraints to ensure the ports from the top module are mapped to the correct pins on the PMOD header. We start by looking at the general xdc file for the Arty A7 on Github. In particular, we are interested in PMOD section JA, which we use for our SD Card Module:

We adjust the port names as follows:

## Pmod Header JA
set_property -dict {PACKAGE_PIN G13 IOSTANDARD LVCMOS33} [get_ports cs]
set_property -dict {PACKAGE_PIN B11 IOSTANDARD LVCMOS33} [get_ports mosi]
set_property -dict {PACKAGE_PIN A11 IOSTANDARD LVCMOS33} [get_ports miso]
set_property -dict {PACKAGE_PIN D12 IOSTANDARD LVCMOS33} [get_ports sclk]
set_property -dict { PACKAGE_PIN D13   IOSTANDARD LVCMOS33 } [get_ports dat1]; #IO_L6N_T0_VREF_15 Sch=ja[7]
set_property -dict { PACKAGE_PIN B18   IOSTANDARD LVCMOS33 } [get_ports dat2]; #IO_L10P_T1_AD11P_15 Sch=ja[8]
set_property -dict {PACKAGE_PIN A18 IOSTANDARD LVCMOS33} [get_ports cd]
set_property -dict {PACKAGE_PIN K16 IOSTANDARD LVCMOS33} [get_ports wp]
Next, we should ensure that our top level module use the same port names:

module top(
    input wire CLK100MHZ,
    output wire cs,
    output wire mosi,
    input wire miso,
    output wire sclk,
    input wire cd,
    input wire wp
    );
...
endmodule
Inside this top level module an instance will live of the Gisselquist SD Card core. We will also be implementing a state machine in this module for instruction the Gisselquist core for sending an IDLE command to the SD Card.

We will cover the state machine in the next section.

The State Machine

Let us create a state machine for issuing a IDLE command to the Gisselquist core.

For starters, one need to look at clock speed. The input clock to the Arty A7 is always 100MHz. This is perhaps a bit too fast for our purposes, so one can create a slower clock with the help of a MMCME2_ADV block. I have covered the use of such a block in in one of previous posts some time ago, so I am not going to cover the process of instantiating one here.

Preferably our state machine should only start once the generated clock is stable. For this purpose we will use the .LOCKED signal of the MMCME2_ADV instance. With this in mind, let us start with an outline of our state machine:

always @(posedge gen_clk)
begin
    if (clk_locked)
    begin
      case (state)
      ...
      endcase
    end
end
So, the state machine will only start changing states once the clock is locked. We start with a number of dummy states to give the Gisselquist core a chance to initialise, after which we de-assert the reset signal to the core:

...
          0: state <= 1;
          1: state <= 2;
          2: state <= 3;
          3: state <= 4;
          4: begin
               state <= 5;
               reset_sd <= 0;
             end
...
So, what next? We could go straight ahead and issue the IDLE command, but preferably we set the signal which we clock the SD Card at the desired initial frequency, which is 400KHz. I am clocking the Sd Card core at a frequency of 10MHZ, so we will need to bring it down by means of the internal clock divider provided by this core. To get to 400KHz we need to use a divider value of 11, which is 0b in hexadecimal. We set the value with the state machine as follows:

          5: begin
                 state <= 6;
                 stb <= 1;
                 wb_sel <= 4'hf;
                 addr <= 1;
                 we <= 1;
                 data <= 32'h5556550b;
             end
          6: begin
                 state <= 7;
                 stb <= 0;
             end
          7: begin
                 state <= 8;
                 stb <= 1;
                 wb_sel <= 4'hf;
                 addr <= 0;
                 we <= 1;
                 data <= 32'hc0;
             end
To understand this snippet, we need to quickly look at the internal registers of the Gisselquist core:
  • Register 0: Used for sending command bytes. SD Cards always expect a command command that starts with 01. For all the other bit combinations of the two significant bits of the command byte, we are free to use as command bytes operating on the Gisselquist core itself.
  • Register 1: Data register containing four extra bytes of data associated with the command byte. If you want to issue a command byte containing multiple bytes, you need to set this register first before issuing the command.
So, from the above snippet we are issuing the command byte c0, which sets some internal configuration registers of the Gisselquist core. The value to write to the configuration registers should be stored in the data register beforehand, which in this case is 32'h5556550b. The lower eight bits of the this value is the value for the divider, which is 0b.

Once we have set the the internal configuration registers, we are free to issue the idle command to the Sd Card:
 
          8: begin
                 state <= 9;
                 stb <= 1;
                 wb_sel <= 4'hf;
                 addr <= 1;
                 we <= 1;
                 data <= 32'hffffffff;
                  
              end
          9: begin
                 state <= 10;
                 stb <= 1;
                 wb_sel <= 4'hf;
                 addr <= 0;
                 we <= 1;
                 data <= 32'h40;
                  
              end
          10: begin 
                 stb <= 0;
             end
So, we issue the command 40 to the SD Card, followed by 4 ff bytes.

Clocking the Sd Card

The Gisselquist core generates a clock that we can clock the SD Card. The temptation is great just to connect this signal directly to the outside world, as we do with the other signals to the SD Card.

However, everything always gets more complex with clocks, whether passing it around within the FPGA or passing it externally. In my previous project where I implemented a C64 core on a Zybo board, I had some fun at one stage dealing with an external clock, here.

I wanted make use of the the onboard Audio Codec on the Zybo board and with my first attempt I wanted to clock this device directly. The Audio Codec simply refused to work. After some digging on the Internet, I discovered that you should always pass a clock to the outside world with an ODDR block.

The clock signal to the SD Card is no exception, so let us define an ODDR instance:

   ODDR #(
      .DDR_CLK_EDGE("OPPOSITE_EDGE"), // "OPPOSITE_EDGE" or "SAME_EDGE" 
      .INIT(1'b0),    // Initial value of Q: 1'b0 or 1'b1
      .SRTYPE("SYNC") // Set/Reset type: "SYNC" or "ASYNC" 
   ) ODDR_inst (
      .Q(sclk),   // 1-bit DDR output
      .C(o_sclk),   // 1-bit clock input
      .CE(1), // 1-bit clock enable input
      .D1(1), // 1-bit data input (positive edge)
      .D2(0), // 1-bit data input (negative edge)
      .R(0),   // 1-bit reset
      .S(0)    // 1-bit set
With all this in place, we are now ready to do a test run on the Arty A7 board

Test Results

Let us have a look at the Test Results, when running the core on the Arty A7:


I tried to cram quite a lot of info into this picture and the names of the signals is perhaps not so readable, so I repeat the signal names:
  • cs (Chip Select)
  • miso
  • mosi
  • o_sclk
Looking at the signals, we can see that we issue the IDLE command (0x40) on the mosi signal, and eventually we get get a response back from the SD Card (e.g. 0x01)via the miso signal, which is what we expect.

In Summary

In this we gave the Gisselquist core a test run on an Arty A7 with a SD Card module attached and issued an IDLE command. With the test, the SD Card responded correctly, confirming that we are on track at the moment with out setup.

There are quite a number of commands one needs to issue to an SD Card to read the data stored on it and this can be very cumbersome to implement via a state machine.

It will be far easier to write a program executed by a CPU for issuing the SD Card commands. We will look into this with the next post.

The CPU I have in mind for this exercise is the 6502. Granted, this is an 8-bit CPU and one will probably not get the best performance given the 32-bit data that needs to be passed quite often to the Gisselquist core, but I think it is a good start.

The 6502 doesn't need so much resources of the FPGA, which will leave us more room for the Amiga core to be used later in the Blog series.

If required, we can always help the 6502 out with some hardware acceleration.

Until next time!