Foreword

In the previous post we managed to issue an IDLE command to SD Card via an SD Card reader, attached to the Arty A7 board. We also confirmed that SD Card send a response back for the command.

Up to this point in time we made use of a state machine for issuing command sequences to the Gisselquist SD core. There is quite a number of commands one needs to issue to an SD Card, in order to do reading/writing of data stored on the SD Card. Using a state machine for this exercise can become quite unpleasant in the long run.

Thus, in this post we will look at using a CPU core, on which we can run a stored program for issuing the SD Card command sequences. The CPU core I will using for this purpose will be Arlet Ottens' 6502 core.

I am sure there will be very frowns out there on using an 8-bit CPU, working with a 32-bit Wishbone device like the Gisselquist SD Card core. However, the 6502 is fairly light on FPGA resources and I think it is worthwhile to see how far this core can help us out.

The Memory Map

Let us have a look at the memory map for our 6502 system:

FFFF - FF00: ROM. For starters we will have a 256 byte ROM, but might grow beyond this size over time. As with many 6502 systems, the startup ROM needs to live in the top part of RAM, because the reset vector is at addresses FFFC-FFFD.
FEFF - FE00: Interface to the registers of Gisselquist SD Card Core. As we have seen in the previous post, we have access to 2 32-bit registers via the Wishbone bus of the Gisselquist core.

Let us zoom a bit into the Interface to the Gisselquist core. The wishbone interface works with 32 bits of data, whereas the 6502 works with 8 bits of data at a time. How does one deal with these differences in data widths?

To explain a possible solution to the problem, let us start by arranging the memory locations starting at FE00 like this:

So, the addresses FE00-FE003 maps to register 0 of Gisselquist core, and the addresses FE04-FE07 maps to Register 1 of the Gisselquist core.

Now, read/writes to the lowest byte of each register (e.g. marked in red), will trigger transactions on the wishbone bus. The byte addresses in black, maps to temporary registers.

Suppose we want our 6502 to write a value to Register 1. We will start by writing the top three bytes of the 32 bit word to memory locations FE07, FE06 and FE05. Writing to these addresses will set the values of temporary registers and will not trigger any wishbone transaction. Witing to FE04, however, will a wishbone write transaction.

With this wishbone write transaction, we will concatenate the values stored in temporary registers FE07, FE06 and FE05, together with the value currently been written by the 6502 to address FE04.

I wishbone read works in a very similar way, triggered by reading either FE04 and FE00. The top three bytes returned by the wishbone read will be stored in another set of temporary registers, which afterwards can also be read by the 6502 at addresses FE07/FE06/FE05 or FE03/FE02/FE01.

I will give more detail on implementing this in coming sections.

Wiring up the 6502

Let us start Wiring up the 6502 core.

For starters, we need a ROM for feeding the 6502 with a program to execute. For this we go on a trip in memory lane, where in 2017 we created a ROM module for our C64 core, here. You can find the full source for this module with this link on Github: https://github.com/ovalcode/c64fpga/blob/master/ip/bblock/src/rom.v

An instance of the ROM module looks like this:

   rom#(
    .ADDR_WIDTH(8),
    .ROM_FILE("romsdspi.bin")
)   rom (
      .clk(gen_clk),
      .addr(cpu_address[7:0]),
      .rom_out(rom_out)
    );

As mentioned earlier, we will start off with only a 256 byte ROM. For this reason ADD_WIDTH is set to 8. We also only use the lower 8 bits of the address from the CPU.

The parameter, .ROM_FILE, is the path to a file on the file system containing a ROM image. This is a text file, one byte per line and in Hex. So, our 256 byte ROM, will result in a file containing 256 lines. As mentioned previously, the rest vector is at address FFFC-FFFD, so the last four lines of our ROM file will look like this:

Here we see the 6502 will start executing at address FF00, the beginning of the last 256 page in the 64K address space. We will cover the assembly code a bit later.

Let us now have a quick look at an instance of the Arlet Ottens core:

cpu cpu( .clk(gen_clk), .reset(...), .AB(cpu_address), .DI(rom_out), .DO(cpu_data_out), .WE(we_6502), .IRQ(0), .NMI(0), .RDY(1) );

Writing from 6502 to SDSPI Core

Let us now focus on the functionality for writing from the 6502 core to a SDSPI Core register.

Firstly, because the 6502 can only deal with 8-bits at a time, we need to add 3 temp registers so we can have 32-bits available that the SDSPI require for a write:

always @(posedge gen_clk)
begin
  if (we_6502)
  begin
    if (cpu_address == 16'hfe01)
    begin
        reg_1 <= cpu_data_out;
    end else if (cpu_address == 16'hfe02)
    begin
        reg_2 <= cpu_data_out;
    end else if (cpu_address == 16'hfe03)
    begin
        reg_3 <= cpu_data_out;
    end
  end
end

Next, let us generate a strobe signal for the SDSPI Core:

always @(posedge gen_clk)
assign on_word_boundary = cpu_address[1:0] == 0;

assign wb_stb = cpu_address[15:8] == 8'hfe && on_word_boundary;

So, we only strobe on a word boundary, e.g. addresses like FE00 and FE04. The applicable signals on the SDSPI core looks like this:

sdspi  sdspi (
...
		// Wishbone interface
		// {{{
		.i_wb_cyc(1), .i_wb_stb(wb_stb), .i_wb_we(we_6502),
		.i_wb_addr({1'b0,cpu_address[2]}),
		.i_wb_data({reg_3, reg_2, reg_1, cpu_data_out}),
...
	);

Reading with the 6502

Now, let us look into reading with the 6502. Reading is a bit more complex than writing, because we can read from potentially two sources: ROM and registers of the SDSPI core.

To cater for the two possible read sources, let us create the following outline:

...
cpu cpu( ... .DO(cpu_data_out), ... );
...
always @(posedge gen_clk)
begin
  addr_delayed <= cpu_address;
end
...
always @*
begin
    casex (addr_delayed)
        ...
        default: combined_data = rom_out;
    endcase 
end
...

The casex is the main part for selecting the correct source. We use a casex instead of a usual case because we use a subset of the bits to decide which source to select. We will add more selectors to our casex in a bit.

One thing you will also notice, is that we are using a delayed version of the address for selection. This is just to cater for the way Block RAMs work, which always has the data ready for given address at the next clock cycle. At the next clock cycle the 6502 core can potentially assert a different address, which can cause data from the wrong source to be selected and presented to the CPU.

Now, let us extend our outline so that we make our 6502 read registers from the SDSPI core:

...
always @(posedge gen_clk)
begin
    wb_data_store <= (wb_stb && !we_6502) ? o_data_sdspi[31:8] : wb_data_store;  
end
...
always @*
begin
    casex (addr_delayed)
        16'b1111_1110_xxxx_xx00: combined_data = o_data_sdspi[7:0];
        16'b1111_1110_xxxx_xx01: combined_data = wb_data_store[7:0];
        16'b1111_1110_xxxx_xx10: combined_data = wb_data_store[15:8];
        16'b1111_1110_xxxx_xx11: combined_data = wb_data_store[23:16];

        default: combined_data = rom_out;
    endcase 
end
...

So, when we read from a SDPSI core regitser we store the top three in temporary register called wb_data_store, which the 6502 can read at later stage if so desired.

At this point we have a small caveat, since the SDSPI Core will not have register data ready at the next clock cycle, but require one additional clock cycle before the data is ready. This behaviour breaks all the assumptions the 6502 core make.

Luckily, the 6502 core does provides an RDY input signal, with which we can effectively pause the 6502 on read for as many clock cycles as we want to, until the data we want is ready in the data bus. With this in mind, we need to change the code above to the following:

...
always @(posedge gen_clk)
begin
    wait_read <= wait_read ? 0 : (wb_stb && !we_6502);
end

always @(posedge gen_clk)
begin
    capture_data <= wait_read;
end

always @(posedge gen_clk)
begin
    wb_data_store <= capture_data ? o_data_sdspi[31:8] : wb_data_store;  
end

cpu cpu(... .RDY(!wait_read) );

always @(posedge gen_clk)
begin
  addr_delayed <= wait_read ? addr_delayed : cpu_address;
end
...

As seen from this code, we also need to wait before we capture a value for wb_data_store, as well as delaying addr_delayed even further is required.

The 6502 Assembly Program

Let us have a look at a 6502 Assembly program for for accessing the SDSPI core, which will ultimately put an SD Card into IDLE mode:

FF00   A9 55     LDA #$55
FF02   8D 01 FE  STA $FE01
FF05   8D 02 FE  STA $FE02
FF08   8D 03 FE  STA $FE03
FF0B   A9 0B     LDA #$0B
FF0D   8D 04 FE  STA $FE04 ; Store the value $5555550B into Data Register
FF10   A9 C0     LDA #$C0  
FF12   8D 00 FE  STA $FE00 ; Init Config registers with value stored in Data Register
FF15   A9 FF     LDA #$FF
FF17   8D 03 FE  STA $FE03
FF1A   8D 02 FE  STA $FE02
FF1D   8D 01 FE  STA $FE01
FF20   8D 04 FE  STA $FE04 ; Load Data register with $FFFFFFFF
FF23   A9 40     LDA #$40
FF25   8D 00 FE  STA $FE00 ; Give Idle command ($40) followed by $FFFFFFFF (e.g. Data Register)

Just to give some context again. Data Register mentioned in the comments is register 1 of the SDSPI core.

The actual command byte is issued via address $FE00. The command byte value $C0 instructs the SDSPI core to load the config registers with values stored in the Data Register. Command byte value $40 instructs the SD Card to go into IDLE mode.

One part I haven't shown in this program is a required endless loop.

The waveforms produced by this Assembly program is the same as in the previous post where we issued an IDLE command by means of a state machine, so I will not present the waveforms in this post.

In Summary

In this post we added added Arlet Otten's 6502 core to our design, so that we can programmatically initialise an SD Card. Doing SD Card initialisation with a state machine will just become too cumbersome on the long run.

In the next post we will try and fully initialise the SDCard and see if we can read a sector of data from the Card.

Until next time!

C64 on an FPGA

Friday, 23 December 2022

SD Card Access for a Arty A7: Part 4