Sunday 29 January 2023

SD Card Access for a Arty A7: Part 6

Foreword

In the previous post we managed to Initialise an SD Card from power up.

In this post we will try to read a sector from the SD Card.

We will continue to use the Gisselquist SD Card core for interfacing with the SD Card.

Buffering a sector of data

When you read a sector of data from a SD Card, the SD Card will respond after a certain amount of time at which it will send the 512 bytes of data one after the other consecutively. If your CPU is busy during this point in time, we might miss a byte or two from the data.

Luckily the Gisselquist core provide us way out of this scenario by buffering 512 bytes of data for us when it becomes available. The CPU can then fetch the data at a later stage from the buffer when it is ready.

This buffer is in actual fact a FIFO (First In First Out) structure. This means that you only need a single address to map the contents of the FIFO into CPU memory space and not 512 addresses.

There is a couple of technicalities to remember when using the FIFO buffers in the Gisselquist core. The first thing is, when you issue a read command to the SDCard, you should also inform the Gisselquist core that this command will be utilising the FIFO buffer. To illustrate this in 6502 assembly language syntax:

...
     .BYTE $00, $00, $00, $00
     .BYTE $00, $00, $08, $51 ; CMD 51
...

The value $51 is the SD Card command for reading a sector.

Next to the value $51 we have the value $08. If you study the final assembly listing from the previous post, you will pick up that usually for a command row the first three bytes of 0's, followed by a command byte. If there is any bits set in the first three bytes, it provides the Gisselquist core with some additional info about the particular command.

In this case the $08 byte signal the Gisselquist core that we expect 512 bytes from the SD Card and this should be stored in the FIFO buffer for later access.

Like with the other commands we need to wait in a busy wait loop until the busy bit changes to zero.

The next question is, how do we read the FIFO buffer? The answer is to read register 2 of the Gisselquist core. Let us recap from previous posts the registers that the Gisselquist core contains. I have added register 2 to the list, just for completeness:

These registers map into our 6502 address space starting at address FE00. It should be remembered that the Gisselquist core has 32 bit registers, whereas the 6502 works only with 8 bits at a time. So, Register 0 will map to address FE00, Register 1 will map to address FE04 and Register 2 will map to address FE08.

Now, back to the details of Register 2, the FIFO buffer register. In order to read the data you can continuously read register 2, which will bring you back 4 bytes at a time, with each read advancing to the next 4 bytes.

You will remember from my previous posts that in our 6502 design we trigger a register read by issuing an address that is a multiple of 4, like FE00, FE04 or FE08. The addresses FE01, FE02 and FE03 stores the remaining three bytes of the register which we couldn't accept, because the 6502 only works with a byte at a time.

There is one final technicality we need to look at. From the previous posts you will remember that we always use command $C0 for initialising the state of the Gisselquist core:

     .BYTE $55, $55, $55, $0B
     .BYTE $00, $00, $00, $C0 ; CMD C0

The value $0B set the value of the clock divider.

Now, the technicality I am referring to are bits 15-18, which specifies the limit of the FIFO buffer as a power of two. IF we use the command as above, the FIFO size limit value will be 5, which equals 32 bytes. The correct value to use for this is 9, which will yield the following $C0 command:

     .BYTE $55, $59, $55, $0B
     .BYTE $00, $00, $00, $C0 ; CMD C0

Verifying our design

When we run our design on a real FPGA, we will need a way to tell that the values are correctly read from the SD Card.

The easiest way for this verification is just to make an image dump of the SD Card we are using, and check some values with a Hex editor. The values we then get back from the FPGA need to match the values we saw in the Hex editor.

To take the SD Card I used as an example:

When I opened my dump, the beginning was filled with zeros. Not very useful for a test. However, Scrolling further down, eventually yielded some data:

So, if we run a test on the Arty A7, we should look at around byte 0x1bf of the sector data returned to verify if our implementation works correctly.

If you are doing a test yourself, you can also expect data in more or less the same spot. Any properly formatted SD Card will have a master boot record (MBR) located in sector zero. According to Wikipedia, bytes 0x1be to 0x1ce contains the first partition entry in a MBR.

Let us write some Assembly for reading sector zero from the SD Card:

       LDA #6
       JSR CMD
       LDA #0
       STA $FE0B
       LDX #$74
LOOPLD
       LDA $FE08
       DEX
       BNE LOOPLD
       LDA #2
       STA $FE0B

We start by issuing a sector read command. In our CMD table this is contained in slot 6, which we load into the Accumulator and we Jump to the CMD routine, which issues the command to the SD Card.

This routine only returns once we received the full response. Once the full response is loaded into the FIFO, our CPU needs to read it out, by reading address $FE08 multiple times.

We will again use an ILA (Integrated Logic Analyzer) block for examining the read data returned by the Gisselquist core to the CPU.

Unfortunately the data we are looking for is quite deep in the FIFO, and the Arty A7 doesn't provide enough block RAM to capture all read data returned. So, we need some trigger that we can only capture the segment we are interested in.

For this reason why are executing the loop in the above assembly $74 times. It should be noted that each read returns 4 bytes, so we need to multiply this number by 4 to get the real byte number where the loop will stop. In this case the byte number is $1d0, which gives us a big enough window for our ILA to capture all bytes in question.

Once the loop has completed, we store the value 2 into address $FE0B. This will set a bit triggering a capture on the ILA block.

Let us have a look at what the ILA capture looks like:

As with a lot of ILA captures, they are simply too wide to present within a blog. With this ILA screen capture I tried to illustrate that I cut out a portion of the screenshot so we can view the important parts together. In the first part of signal we can see the assertion of the capture signal by the CPU at sector sample 0x1d0.

In the second signal we can see that the value 76000000. This is the value captured before we get to sample 0x1d0 and corresponds to the hex dump I presented earlier.

This proves that our design is more or less correct.

In Summary

In this post we attempted to read a sector of data from an SD Card and proved via a dump made from the SD Card via another system that our design read the data correctly.

In the next post we will attempt to read a file from an SD Card using its FAT table. This will be a big milestone in getting an Amiga core to boot up on an Arty A7, which will require to load an Amiga ROM and disk images from an SD Card.

Until next time!

Thursday 5 January 2023

SD Card Access for a Arty A7: Part 5

Foreword

In the previous post we replaced our state machine with a 6502 CPU + machine code program for issuing commands to the Gisselquist SD Card core. We managed to issue an IDLE to command to an SD Card with the 6502 core.

In this post we will continue our endeavour of trying to access an SD Card by means of the Gisselquist SD Card core and a 6502 CPU.

I mentioned in the previous post that in this post I want to finish off with the process of powering up and initialising the SD Card, followed by reading some data from it. However, I found that the process of reading data from the SD Card is quite involved, so to keep things simple I will just be covering the process of initialising the SD Card in this post.

Revisiting 6502 Assemblers

In the previous post I wrote 6502 machine code manually. The required machine code was fairly straightforward, so doing the process manually wasn't that much of a deal.

However, from this point onwards, the complexities of machine code will only increase, so it make sense to rather use an Assembler.

Using an assembler the code will remain readable and self documenting.

The Assembler I have chosen for this purpose is the following online one:

https://www.masswerk.at/6502/assembler.html

During the course of this post, I will give a gradual introduction this assembler. Let us start with a quick outline:

.ORG $FF00
     ; Assembly language instructions
ENDROM = $FFFF-*-3
.FILL ENDROM 00
.BYTE 0, $FF, 00, 00

We start with the directive .ORG, specifying the start address of our program. The assembler needs this info to calculate various things, like if you jump to a label, to calculate the absolute address of that label.

Next we declare a symbol ENDROM, where we actually work with an address at the end of our assembly language program, donated by *. At any point in time within the assembly listing you can get the current address via this asterisk. In the case of ENDROM, the expression will return the number of bytes remaining to get to a total ROM size of 256 bytes. From this number we subtract three, so we can leave a gap at the end for our reset vector.

With the .FILL directive, we add a number of padding bytes. As mentioned previously, ENDROM is the calculated number of bytes that needs to be added to get to 256 bytes, and the .FILL makes it happen.

The .BYTE directive allows us to emit one or more bytes of data. In this case it is the Reset vector, as well as the IRQ vector.

To get an idea into what this outline program will assemble as, let us enter the program into the above mentioned assembler:

As can be seen from the picture, we have a set of zero's starting at address $FF00. If you scroll down, you will see the zeros stop address $FFFF:

Thus, the resulting binary is exactly 256 bytes, which is what we want.

As also can be seen from these screenshots, there is a Show Address checkbox. Unchecking this checkbox, will remove the address from each line, which will make it easy to create a Hex file which is required by Vivado to populate a ROM.

Reducing repetition

In software development we have a very common term called DRY: Don't Repeat yourself.

Well, in the previous post I wrote some 6502 machine code where I repeated the same set of instructions for different pieces of data. We can do better and see if we can encapsulate the code into loops and Subroutines. Also, perhaps store the data into lookup tables.

Let us start with the command for setting the clock speed of sclk, and express it in a lookup table:

DATA: 
     .BYTE $55, $55, $55, $0B
     .BYTE $00, $00, $00, $C0 ; CMD C0

So, here we first present the data for setting the data register in the SDSPI core, and then the actual command.

Let us add one more command and see if we can start to spot some patterns:

DATA: 
     .BYTE $55, $55, $55, $0B
     .BYTE $00, $00, $00, $C0 ; CMD C0
     .BYTE $FF, $FF, $FF, $FF
     .BYTE $00, $00, $00, $40 ; CMD 40

We see that each command has a size of 8 bytes. We can use a zero based index for accessing the bytes for a particular command from the lookup table. For example, for Command $C0 we will use index 0 and for command $40 we will use index 1.

To deal with lookups from a table, the 6502 provide us with the Indirect Indexed addressing mode. Let us start with a basic loop for sending a command:

LOOP:
     LDA ($A0),Y
     STA $FE00,X
     INY
     DEX
     BPL LOOP

From this we can see that the address A0 should contain the base address of the lookup table, which we should initialise in the beginning like this:

.ORG $FF00
     LDX #$FF
     TXS
     LDA #<DATA
     STA $A0
     LDA #>DATA
     STA $A1

A couple of initialisation steps are happening. First we should init the stackpointer with the value $FF. The 6502 doesn't do this at startup and forgetting this initialisation will give you an XX during simulation of the Arlet core in the Stackpointer.

Both "<" and ">" are Assembler directives yielding the low and high address respectively of a label.

Let us focus at the loop code again. The Y register points to a specific entry into the lookup table, incrementing it to the next byte with each iteration of the loop.

The X register starts with a value of 7 and goes to zero. This will transfer the data of a lookup entry to addresses FE07 to FE00. As from the previous post these addresses maps to the Gisselquist SD Card core.

One question that remains is how Y is initialised. The journey starts with the command index stored in the Accumulator, after which we do the following:

     ASL
     ASL
     ASL
     TAY

This is equivalent to multiplying the command index by 8.

This covers more or less what is required to issue a command to the SD Card. There is, however, one caveat we haven't dealt with in the code, and that is that we should wait for the SD Card to complete a command before issuing the next one.

The way to check this is to continuously poll address 0 of the Gisselquist core and see if the busy bit, which is bit 14, is cleared.

Having considered all this, we end off with the following subroutine for issuing a command to the SD Card:

CMD:
     ASL
     ASL
     ASL
     TAY
     LDX #$07
LOOP:
     LDA ($A0),Y
     STA $FE00,X
     INY
     DEX
     BPL LOOP
     AND #$80
     BMI END
BUSY
     LDA $FE00
     BIT $FE01
     BVS BUSY
END
     RTS

One thing that might look a bit strange is that we and the command byte, which is always the last byte in a command entry of the lookup table, with $80. Here we basically want to test if the command is a true SD Card command (which always starts with 01) and not a command dedicated to the Gisselquist core. There is always a wait associated with a SD Card command, but not with a Gisselquist core command.

Verilog issues

While I was testing the 6502 machine code I developed in this post, I discovered a couple of flaws with my existing FPGA design.

The first issue is when executing the command STA $FE00,X with X 0 or 4 which triggers a wishbone bus operation.

With this instruction the complete address is basically asserted for two consecutive clock cycles on the address bus and the write line is asserted only at the second consecutive cycle.

Now when the full address is asserted during the first clock cycle, the system assumes a memory read because the write signal is not asserted. With normal block RAM this is not an issue and will just result in a redundant read.

However, with addresses FE00 and FE04 things get a bit more complicated since these ones trigger a wishbone read transaction. As we know at this point in time wishbone reads asserts the RDY signal on the 6502 during some clock cycles.

All in all things just gets more complicated when you trigger a read on the wishbone bus on one clock cycle and a write on the bus the next clock cycle. These action makes the 6502 and the Gisselquist core out of sync with each other and the wrong values gets written.

There is probably a number of ways to solve this, but the easiest way I could come up with was just add a register to our FPGA design instructing the system to ignore all reads to the Wishbone bus. When are at a point in our program where we will do a couple of writes via an Absolute,X instructions we just need to set this register so reads to wishbone bus can be ignored.

Let us implement this register:

...
reg [7:0] ignore_reads = 0;
...
assign wb_stb = cpu_address[15:8] == 8'hfe && on_word_boundary && !(ignore_reads[0] && !we_6502);
...
always @(posedge gen_clk)
begin
  if (we_6502 && cpu_address[15:8] == 8'hfe)
  begin
    if (cpu_address[1:0] == 2'h1 && !cpu_address[3])
    begin
        reg_1 <= cpu_data_out;
    end else if (cpu_address[1:0] == 2'h2 && !cpu_address[3])
    begin
        reg_2 <= cpu_data_out;
    end else if (cpu_address[1:0] == 2'h3 && !cpu_address[3])
    begin
        reg_3 <= cpu_data_out;
    end else if (cpu_address[3:0] == 11)
    begin
       ignore_reads <= cpu_data_out;
    end
  end
end
...

I have highlighted the changes in build. I have made ignore_reads 8 bits wide, in case we need additional signals later on.

With these changes our memory map in the FE00 range is like this:

FE00-FE07: SD Card core registers
FE0B: Ignore reads

I will present a full Assembly listing at the end of this post to show how FE0B should be used.

Another thing we need to implement in Verilog is to map block RAM for Zero Page and the Stack. This is very similar to what we did in the previous post where we mapped ROM in the space FF00-FFFF, so I will not be covering it here.

Looking Deeper into SD Card commands

Up to this point we have only used the SD Card IDLE command. Let us have a look at some other commands with the focus of initialising an SD Card.

I will try and be brief about these commands. If you want more detail on these commands, you can consult the following sources:

http://www.rjhcoding.com/avrc-sd-interface-1.php

http://www.rjhcoding.com/avrc-sd-interface-2.php

http://www.rjhcoding.com/avrc-sd-interface-3.php

https://openlabpro.com/guide/interfacing-microcontrollers-with-sd-card/

Let us start by looking at the command CMD8, which tells us if the card is indeed an SD Card or an MMC card. In my case I will only call this command for own curiosity to confirm that this card is indeed an SD Card. At this point I will not expect the 6502 program to make any decision based on whether the card is an SDCard or MMC.

The Byte definition in 6502 assembly for this command is as follows:

     .BYTE $00, $00, $01, $AA
     .BYTE $00, $00, $02, $48 ; CMD 48

From this we can see that CMD8 starts with the byte $48, followed by four bytes which end with the bytes $01 and $AA.

You will notice that next the command byte, there is byte of value with 2. This value informs the Gisselquist core on what type of response we are expecting, which in this case is a response byte followed by 4 bytes. This info is important so that we read the correct number of bits from the serial line.

The next command of interest is CMD58. This basically tells us the voltages that the Card supports. The definition in Assembly code is the following:

     .BYTE $FF, $FF, $FF, $FF
     .BYTE $00, $00, $02, $7A ; CMD 7A

This is also a command where we get one response byte back followed by four bytes. The format of the trailing four bytes are as follows:

Here I am using a diagram from http://www.rjhcoding.com. The most interesting bits are bits 15-23, indicating the voltages the SD Card can handle. According to the SD Card spec the general excepted voltage is 3.3V. However, the SD Card I am testing with have all the bits 15-23 set to one, meaning that it can work with the voltage range 2.7V-3.6V. Not sure how much other SD Cards will differ.

Another interesting bit is bit 31. While the card is powering up, this bit will be 1, and will change to zero once power up is completed.

Let us move onto the command that performs the actual initialisation. This actually involves two separate commands, CMD55 and ACMD41. The first command signals that the next command will be a application specific command, which is ACMD41.

The assembly byte definition for these commands are as follows:

     .BYTE $00, $00, $00, $00
     .BYTE $00, $00, $00, $77 ; CMD 55
     .BYTE $40, $00, $00, $00
     .BYTE $00, $00, $00, $69 ; CMD 41

You will notice that CMD 41 contains a command byte $40. This is because bit 31 of the command data is reserved and should be set to one.

The CMD55 and ACMD41 you need to call continuously in a loop and during each iteration you need to check the response byte of the ACMD41 command. When the response byte has transitioned from a 0 (e.g. BUSY), to 1 (initialised), the SD Card initialisation has completed and it is ready to accept read/write commands.

The full program

Here is the full program listing:

.ORG $FF00
     LDX #$FF
     TXS
     LDA #<DATA
     STA $A0
     LDA #>DATA
     STA $A1
     LDX #1
     STX $FE0B
START:
       LDA #0
       JSR CMD
       LDA #1
       JSR CMD
       LDA #2
       JSR CMD
       LDA #3
       JSR CMD
INIT
       LDA #4
       JSR CMD
       LDA #5
       JSR CMD
       ROR A
       BCS INIT 
       LDA #3
       JSR CMD
       LDA #2
       STA $FE0B
       LDA $FE04
DONE
       JMP DONE
CMD:
     ASL
     ASL
     ASL
     TAY
     LDX #$07
LOOP:
     LDA ($A0),Y
     STA $FE00,X
     INY
     DEX
     BPL LOOP
     AND #$80
     BMI END
     LDX #0
     STX $FE0B
BUSY
     LDA $FE00
     BIT $FE01
     BVS BUSY
END
     LDX #1
     STX $FE0B
     RTS
.ALIGN $8
DATA: 
     .BYTE $55, $55, $55, $0B
     .BYTE $00, $00, $00, $C0 ; CMD C0
     .BYTE $FF, $FF, $FF, $FF
     .BYTE $00, $00, $00, $40 ; CMD 40
     .BYTE $00, $00, $01, $AA
     .BYTE $00, $00, $02, $48 ; CMD 48
     .BYTE $FF, $FF, $FF, $FF
     .BYTE $00, $00, $02, $7A ; CMD 7A
     .BYTE $00, $00, $00, $00
     .BYTE $00, $00, $00, $77 ; CMD 55
     .BYTE $40, $00, $00, $00
     .BYTE $00, $00, $00, $69 ; CMD 41
ENDROM = $FFFF-*-3
.FILL ENDROM 00
.BYTE 0, $FF, 00, 00

As you can see there is a loop at the INIT label where we continuously CMD55 and ACMD41 until the card is initialised.

You will also notice that we use the address $FE0B as mentioned previously to disable the creation of wishbone read commands if required.

In this code I have also purposed bit 1 of $FE0B for something else. I am using this bit as trigger for a Xilinx ILA debug core for capturing data. In the code I am setting this bit when invoking command index 3 (e.g. CMD58 or command byte $7A) for a second time.

By triggering the ILA core at this point we can inspect the OCR after initialisation to see if bit 31 has changes to a zero, indicating that the initialisation was indeed successful. The signal I am inspecting with the ILA for this is the miso signal, from which we get the serial data from the SD Card.

To get a better overview of what is going on, I have included a screenshot of a ILA capture on machine for the above scenario:

The key signal here is miso. The location where the logic level initially drops from a 1 to 0 is the start of the response from the SD Card for the CMD58 command. Use the rising edge of each o_sclk as reference for each bit of data.

The first byte of data has every bit zero. This is our response byte and indicate that the SD Card is not in IDLE mode anymore. Should this byte had a value of 1, this would have indicated that the SD Card was in IDLE mode.

The following two bits are one, meaning that both bit 31 and bit 30 are one. This indicates that the power up routine is completed and the SD Card is ready to accept read/write commands.

From the rest of the bits we can deduce that bits 15-23 are all ones, meaning that my SD Card support all mentioned voltage levels.

In Summary

In this post we wrote a 6502 assembly program for initialising an SD Card. We also issued some other SD Card commands to confirm that the Card has properly powered up.

In the next post we will attempt to read from the SD Card.

Until next time!