C64 on an FPGA: November 2018

Friday, 30 November 2018

Resetting a USB device and reading Config Data

Foreword

In the previous post we discussed a bit of theory surrounding USB communications and started to implement some interrupts from the USB functional block in the Zynq.

In this post we will get a bit more practical and see how to reset a USB device and to read configuration information from it.

In this post we will not be implementing functionality to detect when a device is plugged or unplugged, in order to keep things simple. We will thus assume a USB keyboard is attached on the USB port when we start up.

The Life Cycle of a USB Device

To get a bit of context of for this post, let us look at the life cycle of a USB device. There is a couple of states involved.

The attached state is the state when you have just attached a USB device into the USB port.

Provided the USB host enabled a voltage between the VCC pin and GND pin on the USB port, the USB port will power up and enter the Powered state shortly after attachment.

It should be noted, though, that when a USB Host was just powered up, no voltage will be present on the VCC+GND pins. It is up to you to configure the USB Hub so that power is enabled over these two pins.

Once a USB is in the powered state, it still will not respond to any Host commands over the ports. You first need to apply a reset over the USB port so that the device enter the Default state.

When a USB device is in the default state, it will respond to traffic on device address 0 and on endpoint 0.

It should be noted that the USB device will not stay in the default state for long, probably for a couple of tens of milliseconds, at most. It is up to you to get the device to the address state as soon as possible.

At the addressed state the USB will be assigned a non zero-address and all subsequent communication will be directly to this new address.

For the USB device to become fully functional it needs to transition to the configured state.

For the purpose of this post, we will just be moving to the default state and requesting a device descriptor and requesting a configuration descriptor. More on these descriptors later on.

Switching Port power on and resetting device

Let us get to writing some code.

First thing we should do, is to switch the USB module to Host mode. For this we need to use the lower three bits of register 0xe00021a8. The function of these three bits is defined as follows:

00 (default): Idle
01: resrerved
10: Controller device mode
11: Controller in host mode

This corresponds to the following code:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
}

int main()
{
      Xil_DCacheDisable();
      init_platform();
      initint();
      initUsb();
      usleep(100000000);
      cleanup_platform();
      return 0;
}

Next, we should switch on the port power. Bit 12 of register 0xE0002184 performs this task for us. So let us extend our method initUsb:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
 u32 in2 = Xil_In32(0xE0002184) | 4096;
 Xil_Out32(0xE0002184, in2); //switch port power on
}

The code above will bring our USB device into the power state. Next we need to reset the device to bring it into the Default state. Bit 8 of register 0xe0002184 is used to initiate port reset. So, let us create the following method to assert reset and to de-assert the reset:

void set_port_reset_state(int do_reset) {
  u32 in2;
  if (do_reset) {
 in2 = Xil_In32(0xE0002184) | 256;
 Xil_Out32(0xE0002184, in2);
  } else {
 in2 = Xil_In32(0xE0002184) & (~256);
 Xil_Out32(0xE0002184, in2);
  }

}

Now, if one read through the USB 2.0 specification, it looks like we need to allow at least 12ms for USB device to come out of reset. Here we will make use of the state_machine method we defined in the previous post to assist in scheduling the 12ms delay:

...
void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear

   if (status == 0) {
    set_port_reset_state(1);
    scheduleTimer(12000);
    status = 1;
    return;
   } else if (status == 1) {
    set_port_reset_state(0);
    status = 2;
   } else if (status == 2) {
    printf("\n");
   }


}
...
int main()
{
     Xil_DCacheDisable();
     init_platform();
     initint();
     initUsb();
     status = 0;
     state_machine();  
     usleep(100000000);
     cleanup_platform();
     return 0;
}

A you might remember from the previous post, we are using the state_machine method as a callback when an interrupt happens within the USB block of the Zynq. Here, however we are also calling it from the main method. We do this just as the initial state for our state machine.

In this initial state we assert the port reset and schedule the timer for 12ms. After the 12ms an interrupts will trigger and the state_machine method will be called again. This time around we will de-assert the port reset. At this stage our USB keyboard should be in the default state listening for USB traffic on address 0.

It is in this default state we can read the device descriptor and the configuration descriptor from the USB device.

In order to read these descriptors from the device, we need to schedule an asynchronous schedule, which we briefly touched on in the previous post. To schedule this, we need to know more about the following datastructures: Q Head (QH) and q Transfer descriptors (qTD). We will discuss this in the next section

Q Heads and Transfer descriptors

Let us have a look at the QH and qTD data structures.

First the QH data structure. This structure is discussed within the Zynq TRM on page 463.

The first word in the structure is a pointer to the next QH. For our Asynchronous schedule, the first QH will just be pointing to itself.

The second word have a couple of fields of importance:

RL (NAK counter reload): For our case we will just use a value of 15
C (Control endpoint flag). Set this field to a one if it is a non High Speed, control endpoint. We will indeed set this field to one in our case.
Maximum Packet length: We will be setting this field to 8.
H (Head of reclamation list). Set this value to one, since we be having one, and one only QH
DTC (Data toggle control). Set to one
EPS (End Point Speed):

00: Full Speed
01: Low Speed (What we will be using)
10: High Speed
11: Reserved

EndPt (End point address): Since we will be using using the Control Endpoint, this value will be set to zero
I: Set to zero
Device Address: Since we will be operating when the device is in the default state, we will use device address zero

For the third word, I will not going into detail. We will just be using the value 0x40000000.

You will see that the remaining words is coloured in grey and according to the legend this means Host Controller Read/Write. We will leave all these zero, accept for Next qTD Pointer, in which we will specify the first qTD.

Let us now move unto the qTD structure. This structure is discussed on page 459 in the Zynq TRM.

The first word is a pointer to the next qTD structure.

The third word contains the following fields:

DT: Data toggle
Total Bytes: To Bytes to receive from or send to USB device
IOC: Cause an interrupt when this transfer is finished
C_Page (Current Page): Index to the current buffer (e.g. 0 to 4)
Cerr (Error counter)
PID (PID Code): More on this in the following section

00 Out
01 In
02 Setup

Status
Buffer Pointers 0 to 4: Four pointers, each of whicg points to a 4KB buffer. This contains the data received from or to send to the USB device that is assoaiated with this transfer descriptor.

Data Transfers from USB devices

One of the data-structures we covered in the previous section was transfer descriptors. A Transfer descriptor is what its name implies, that is to transfer data to or from the USB device.

According to the USB 2.0 specification, you get a couple of different types of Transfer, but in this post we will only focusing on one type: Control Transfers. The following web page does quite a good job of explaining control transfers, together with some diagrams:

https://www.beyondlogic.org/usbnutshell/usb4.shtml#Control

This web page basically states that a Control Transfer can be broken down into a couple stages. To narrow down the number of scenarios, I am just going to focus on one particular use case: Getting a device descriptor from the USB device via the Control endpoint.

With this use case in mind, let us have a look at the different stages for a Control transfer.

Setup stage

The setup stage starts by issuing a setup token to the USB device. This indicates to the USB device what kind of data is about to follow.

Next follows a data packet. In our use case where want to request a device descriptor, this packet would contain this request to the USB device as such.

The USB would acknowledge the whole request with a ACK packet, indiacted by the white block in the diagram.

This whole stage would be taken care of by a transfer descriptor as discussed in the previous section. Interestingly, for this stage a PID would specified.

The data packet for this request would be contained in a buffer pointed to buffer pointer 0 contained in word 3 of the relevant qTD.

The Data Stage

In the data stage there are two scenarios. The first scenario is when we expect data back from the USB device and the second scenario is is we are required to send data to the USB device after the setup phase.

For our Use case we are only interested in the first scenario. For this we need need to setup a qTD with a PID of one. The Buffer pointer in this qTD will be a pointer to the buffer that will receive the data from the USB device during the Data Packet Phase.

Don't worry about the Handshake packet for now. This will be covered in the next phase.

The Status Stage

With our Use case we are receiving data from the USB Device, so it is up to us to acknowledge this data during the status stage.

For this stage we need to create a qTD with a PID of zero. Since we will be sending a data packet of zero length, we don't need to specify a valid buffer pointer in this qTD.

Initialising the Async queue

Now with a bit of theory behind, let us write some again. This time we will initialise the async queue.

Since our application is a bare-metal application, we will not be making use of malloc calls to allocate memory for our data structures. Instead, we will use some specific memory locations for our data-structures.

We start off by clearing the memory region we will be using for our data structures:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
 u32 in2 = Xil_In32(0xE0002184) | 4096;
 Xil_Out32(0xE0002184, in2); //switch port power on

 for (int i = 0; i < 1000000; i = i + 4) {
  u32 current = 0x300000 + i;
  u32 *currentword;
  hello = currentword;
  *currentword = 0;
 }
}

This will clear 1 Million bytes worth of words to zero starting at address 0x300000.

Next, we should setup a QH and a couple of qTD's. To assist us with this, we first need to create a helper data structure, making it easy to navigate through the 4 byte word nature of these data-structures:

struct QStruct {
  u32 word0;
  u32 word1;
  u32 word2;
  u32 word3;
  u32 word4;
  u32 word5;
  u32 word6;
  u32 word7;
};

We can now continue and create a QH:

void initUsb() {
...
 struct QStruct *qh;
 qh = 0x300000;
 qh->word0 = 0x300002;
 qh->word1 = 0xf808d000; //enable H bit -> head of reclamation
 qh->word2 = 0x40000000;
 qh->word3 = 0;
 qh->word4 = 0x300040;// pointer to halt qtd
 qh->word5 = 1;// no alternate

}

This QH starts at memory location 0x300000. The next pointer points back to itself (e.g. the first word).

You will also release that this pointer ends with 2 instead of zero. This is because bit 1 and 2 actually represents the head type of the pointer, which in this case is a QH.

Word 4 is a pointer to the first qTD of this QH, which starts at address 0x300040.

Let us now have a look at the qTD:

void initUsb() {
...
 struct QStruct *qTD;
 qTD = 0x300040;
 qTD->word0 = 1; //next qtd + terminate
 qTD->word1 = 0; // alternate pointer
 qTD->word2 = 0x40; //halt value// setup packet 80 to activate
}

Word one has the value 1, menaing there is not a valid next qTD.

For word2 we specify a value of 0x40. This create us a async schedule in the halt state. We can now enable the async schedule:

void initUsb() {
...
 Xil_Out32(0xE0002158,0x300000); // set async base
 in2 = Xil_In32(0xE0002140) | 0x1;
 Xil_Out32(0xE0002140,in2); //enable rs bit
 in2 = Xil_In32(0xE0002140) | 0x20;
 Xil_Out32(0xE0002140,in2); // enable async processing

}

The async schedule is started by setting bit 5 of register 0xe0002140. Once enabled, the scheduler looks at register 0xe0002158 as the location for the first QH.

As mentioned, this async is now in the halt state. We need to add additional qTD's to make this schedule do something useful.

We will cover this in the next section.

Setting up a Transfer

Int he previous section we managed to enable an async, although not a very useful one: everything is in the halted state!

Let us start by creating a method for enabling a useful transfer:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {

}

Setup specify whether we should add a setup qTD.

Direction specify whether we want to receive or send data. For receiving direction should be a 1.

Size is the number of bytes we want to send or receive.

qh_add is the address of the QH at which we want to add the qTD's.

If we require a setup token, we convert the halt qTD to a setup qTD:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {
   struct QStruct *qh;
   qh = qh_add;
   u32 first_qtd = qh->word4;
   struct QStruct *firstTD;
   struct QStruct *nextTD;
   firstTD = first_qtd;
   nextTD = first_qtd;
   if (setup) {
     firstTD->word0 = calNextPointer(first_qtd); //next qtd + terminate
     firstTD->word1 = 1; // alternate pointer
     firstTD->word2 = 0x00080240; //with setup keep haleted/non active till everything setup
     firstTD->word3 = 0x301000; //buffer for setup command

    }
}

You will see that the lower eight bits of word2 is still 0x40. This means that our queue will remain in the halt state till we change it another value.

Next, we should add the remaining qTD's:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {
...
    if (size > 0) {
       if (setup)
         nextTD = calNextPointer(first_qtd);

       nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
       nextTD->word1 = 1; // alternate pointer

       nextTD->word2 = (size << 16) | (direction << 8) | (nextTD == firstTD ? 0x40 : 0x80) | 0x80000000;


       if (direction == 0)
   nextTD->word2 = nextTD->word2 | 0x8000;
       nextTD->word3 = setup ? 0x302000 : 0x301000; //buffer for setup command

       nextTD = calNextPointer(nextTD);

       if (direction == 1) {
   nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
   nextTD->word1 = 1; // alternate pointer
   nextTD->word2 = 0x80008080; //with setup keep haleted/non active till everything setup
   nextTD->word3 = 0x301000; //buffer for setup command
       }
    } else {
     //size = 0
     nextTD = calNextPointer(first_qtd);
        nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
        nextTD->word1 = 1; // alternate pointer
        nextTD->word2 = (0 << 16) | (1 << 8) | (nextTD == firstTD ? 0x40 : 0x80) | 0x80000000 | 0x8000;

    }
    if (nextTD == firstTD)
     nextTD->word2 = nextTD->word2 | 0x8000;
 nextTD = calNextPointer(nextTD);
 nextTD->word0 = 1; //next qtd + terminate
 nextTD->word1 = 1; // alternate pointer
 nextTD->word2 = 0x40; //with setup keep haleted/non active till everything setup
 nextTD->word3 = 0x301000; //buffer for setup command
}

The last qTD is again a halt qTD.

This code is also written as such so that the last qtd executed creates an interrupt.

Once all the qTD's has been setup, we can mark the first qTD in the sequence as runnable:

  firstTD->word2 = (firstTD->word2 & (~0x40)) | 0x80;

One final method that should be implemented is calNextPointer:

u32 calNextPointer(u32 currentpointer) {
 currentpointer = currentpointer - 0x300040;
 currentpointer = currentpointer + 0x20;
 if (currentpointer > 0x200)
  currentpointer = 0;
 return currentpointer + (u32)0x300040;
}

This method advances to the next address and return to 0x300040 after a couple of advances, in effect simulating a circular buffer.

Reading a descriptor from USB Device

With the method created in the previous section, we can now use it to read a descriptor from the USB device.

To get the descriptor we need to schedule the transfer with the command request stored in a buffer. The USB 2.0 spec give us an indication on how this looks like on page 250:

Let us have a look at the values. We start with bmRequestType with value 0x80.

For bmRequest we need to use the constant GET_DESCRIPTOR. To get this value scroll down to the next page of the USB 2.0 spec and you will see the value is 6.

Descriptor Type is retrieved from the next table and have value 1 for Descriptor type DEVICE.

Wlength has value 0x12.

We can now modify our state_machine method to send a DEVICE_DESCRIPTOR request:

void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear

   if (status == 0) {
    set_port_reset_state(1);
    scheduleTimer(12000);
    status = 1;
    return;
   } else if (status == 1) {
    set_port_reset_state(0);
    status = 2;
    //device descriptor
                  Xil_Out32(0x301000, 0x01000680);
                  Xil_Out32(0x301004, 0x00120000);
                  schedTransfer(1,1,0x12, 0x300000);
   } else if (status == 2) {
    printf("\n");
   }
}

As seen, we are writing the request to the buffer at address 0x301000.

The descriptor returned by the USB device will be stored at location 0x302000. The else statement for status 2 can be used to print the contents of this buffer. Let us have a look at the contents of the buffer:

302000:   01100112
302004:   08000000
302008:   0C231A2C
30200C:   02010110
302010:   00000100
302014:   00000000

Since bytes are stored with an ARM core is little endian, for each word you should read the bytes from right to left.

So, we start off with 0x12 which is the length of the Descriptor.

0x1 is the Descriptor typem which in this case is DEVICE.

The next two bytes indicates the USB version which in this case is 1.1.

The next couple of bytes gives information of the Device class which is zero for three bytes. This means more info about the device is provided in the Configuration descriptor.

Following that is the maximum packet size which is 8 bytes.

Then there is a couple of vendor, product versions.

The last number of the descriptor is a 1 meaning that there is only one possible configuration.

That concludes our discussion on getting and reading the DEVICE descriptor.

Reading the Configuration Descriptor

Time for us to read the configuration descriptor. For this wee need to modify our state_machine method again:

void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear
   if (status == 0) {
     set_port_reset_state(1);
     scheduleTimer(12000);
     status = 1;
     return;
   } else if (status == 1) {
     set_port_reset_state(0);
     status = 2;
     //configuration descriptor
     Xil_Out32(0x301000, 0x02000680);
     Xil_Out32(0x301004, 0x003b0000);
     schedTransfer(1,1,0x3b, 0x300000);
   } else if (status == 2) {
     printf("\n");
   }
}

This time our we get back 0x3b bytes. In effect this data contains a number of descriptors where each one start with the number of bytes:

09 02 3B 00 02 01 00 A0 32
09 04 00 00 01 03 01 01 00
09 21 10 01 00 01 22 36 00
07 05 81 03 08 00 0A
09 04 01 00 01 03 00 00 00
09 21 10 01 00 01 22 32 00
07 05 82 03 08 00 0A

Let us discuss these descriptors.

The first descriptor:

Descriptor type: 2 -> Configuration descriptor
0x3b -> overall length of all desriptors
02 -> number of interfaces
01 -> value to select this configuration
00 -> string index for textual description. In this case none available
0xA0 -> couple of attributes
0x32 -> max power in 2mA units. In this case 100mA

From the configuration descriptor we see that two interfaces are defined. Each interface has the descriptor type number 4. Further more, byte 5 of this interface descriptor specify the interface type. For both these interfaces this is type 3, which is an HID(Human interface device).

The next two bytes of interest of the Interface descriptors is byte 6 and 7. For the first interface descriptor these bytes are 1 and 1, whereas for the second one it is 0 and 0.

For the first interface these two bytes corresponds to the following:

Boot interface Subclass
Keyboard

This type of Interface is a simplified keyboard interface for BIOS's and we will indeed use this Interface for our design.

Let us now go and have a look at the endpoint for this Interface.

07 05 81 03 08 00 0A

0x81 specify that this is an IN endpoint and the address of this endpoint is 1.

The 3 specifies that this in an interrupt endpoint.

The 8 specifies that the maximum packet size for this endpoint is 8 bytes.

The 0x0A specify the polling interval in milliseconds. Thus in this case the polling interval is 10 milliseconds.

In Summary

In this post we managed to read a couple of descriptors from a USB keyboard and isolated an endpoint that we can use to read keystrokes from the keyboard.

In the next post we will attempt to read keystrokes from the USB keyboard.

Till next time!

Friday, 23 November 2018

Getting started with USB Protocols

Foreword

In the previous post we managed to implement the flashing cursor and keyboard interaction.

At this point in time a keypress can only be simulated by running a program on the ARM core that writes a value to a specific register and we are not yet at a point of integrating a physical keyboard to our C64 system.

The goal I have in mind is to integrate with a USB keyboard. To do this there is an easy way and a difficult way.

The easy way is way is to make use of PetaLinux, a Linux distribution provided by Xilinx for the Zynq processor. Going this route will give you some USB and keyboard drivers simplifying our integration with a USB keyboard, avoiding to worry about the technicalities of the USB protocol.

Then there is the difficult route, trying to access the USB keyboard in Standalone mode. In standalone mode you cannot make use of the drivers that comes bundled with Linux, and you sort of need to re-invent the wheel for USB keyboard interaction.

Re-inventing the wheel is not really cool, but it gave me some second thoughts. I have been using USB devices for almost 18 years without knowing how the communication work between the PC and a USB device.

Going the difficult route is actually an opportunity to learn how USB works and in the next couple of posts (two or three?) we are going to do just that.

We are going to start off with a bit of theory on USB protocols and will gradually work our way to a practical implementation.

I am not going to implement a full USB protocol stack, but just bear minimum that is necessary to catch keystrokes from a USB keyboard.

A note about the source code

I have received a couple of requests to publish the full source code for the project in its current state.

I recently done via the following link on Github: https://github.com/ovalcode/c64fpga

Within the Readme.md I am going some instructions on how to create the project files and building the project.

USB Protocol Overview

When you plug a USB keyboard into your PC your PC is known as the USB host and the keyboard as the USB device.

A USB can support one or more functions. Let us have a quick look at an example where a USB device can support more than one function.

Say, for example, a manufacturer brings out a USB-webcam. The manufacturer might also decide to also ship the device drivers on the web cam itself and surface it to your PC as a Mass Storage Device.

Cool solution, but how will your PC differentiate between these two functionalities on the same set of USB wires?

The answer is to give each functionality an endpoint number. When your PC communicates with the USB device, it always need to provide an endpoint number so that the USB device knows for which function the message is intended for.

Let us now move onto the topic on how USB devices are addressed. USB devices are connected to the PC in a star topology.

Star topology is the same topology used on a Commodore 64 to attach multiple drives and printers to the single serial port on the C64.

A quirk with the star topology is that all devices can see all traffic of each other. To avoid confusion a unique address needs to be assigned to each device.

On C64 disk drives you make use of jumpers on each drive to assign the address.

On USB devices you don't have jumpers. So, how are the addresses assigned?

The answer is that you just need to reset a device, then it will be in the default address state and respond on requests on Address zero.

At this point the alert reader will say: 'Aha! You just said USB uses a start topology, so won't reset signal reset all the USB devices?'

Yes it is, but a USB reset signal is one of the signals a USB host has finer control of, and you can limit a reset to a specific to a specific USB port.

Let us conclude our Overview of USB by having a look at how communication is orchestrated between the devices and the Host.

USB communication among the devices and the host is orchestrated by means of Host polling.

This means that the host initiates all communication. Even if a USB device have some information that urgently needs the attention of the host it needs to wait for the host to ask it for the information.

EHCI

Back to the ZYBO board. If you have a look in the ZYNQ 7000 manual, you will see that it provide some information on how to establish communication between your Zybo board and a USB device.

However, when working on your USB implementation, you will probably find that the information provided within this Technical Reference manual is simply not enough to give you a clear direction on where to go.

Doing an Internet search on how to implement USB on the Zybo board will probably also not be fruitful either.

I almost went into despair over this, till I found that the USB specific registers on the Zynq is not specific to the Zynq only, but follows a specific standard known as ECHI (Enhanced Host Controller Interface).

The thought that the USB implementation was not Zynq specific, actually widened my horizon and immediately was able to find more implementation examples. In fact, I could find a nice example within the Linux source tree.

To communicate with USB devices, the EHCI standard defines two schedules into which you can queue USB communication requests: Periodic schedule & Asynchronous schedule.

You make use of the Periodic schedule if you want to poll specific USB devices for information at specific time intervals. This will typically be for USB devices like USB sound cards giving a stream of information at a fixed data rate. On page 446 of the Zynq technical reference, it is explained how the periodic schedule is implemented.

We will cover in more detail how this schedule is setup at a later stage.

There might be cases where you don't want to poll a USB device constantly for information, but in a more adhoc fashion as the need arises. For this you will make use of asynchronous queues. On page 448 of the Zynq reference manual, it is explained how asynchronous queues works.

The interesting part in the diagram is where it mentions Insert and Remove QH's as needed, just reiterating its adhoc nature. When you are at a point of not needing any information from the Async Queue at the moment, you will just have a Queue Head pointing to itself been in one or other Halt state.

When you suddenly need some information again from a particular USB device, you can add a Queue head to the Queue, which will be processed and the Queue will return back to a halt state.

We will cover setup of the Asynchronous queue also at a later stage.

Writing some code

We have covered quite of theory. Let us now see if we can start writing some code.

Programming a USB interface have quite some detail, and one can a bit overwhelmed so that you don't know where to start. But, we can always start with small steps.

Let us start with the following:

Getting Caching right
Enabling Interrupts

Starting with getting the Caching right. In order to set up a periodic queue or an async queue, you need to write some data structures directly to SDRAM. When an ARM core writes these structures, the data-structures might not end up in SDRAM straight away, but will linger for some time in an L1 or an L2 datacache.

There are a couple of ways to deal with this potential caching issue. I am just going to take a simple approach and disable the Data Cache all together:

#include <stdio.h>
#include "xil_exception.h"
#include "xparameters.h"
#include "platform.h"
#include "xil_printf.h"
#include "xil_cache.h"
#include "xil_io.h"
#include "xscugic.h"
#include "xgpiops.h"
#include <unistd.h>

int main()
{
    Xil_DCacheDisable();
    init_platform();
    cleanup_platform();
    return 0;
}

I have added so long most of the common headers we will need over time for our USB exercise. The headers together with the associated libraries is provided by the Xilinx SDK when you compile your program as a standalone.

Let us us now move onto interrupts. The USB module present within the Zynq provides interrupts for two timer expiry events, and USB events like when transfers is completed. These are very useful interrupts indeed which we would like to intercept.

It is therefore necessary to enable the above mentioned interrupts and ensure one of our custom methods gets called when they happen.

To configure interrupts on the Zynq (and probably most ARM based SoC's) is quite a mission.

So, in effect you need to figure how to program the Generic Interrupt controller and then how to enable interrupts on the ARM processor.

Luckily the Xilinx SDK provided some wrappers for shielding most of the complexity for us.

A strip down version for enabling the interrupts will looks as follows:

...
int help;
int myhelp;
XScuGic_Config *IntcConfig;
XScuGic INTCInst;
...
void state_machine();
...
void initint() {

 IntcConfig = XScuGic_LookupConfig(0);
 int status;
 myhelp = 1;

 status = XScuGic_CfgInitialize(&INTCInst, IntcConfig, IntcConfig->CpuBaseAddress);
 Xil_ExceptionRegisterHandler(XIL_EXCEPTION_ID_INT,
         (Xil_ExceptionHandler)XScuGic_InterruptHandler,
      &INTCInst);
 Xil_ExceptionEnable();
 status = XScuGic_Connect(&INTCInst,
         53,
         (Xil_ExceptionHandler)state_machine,
         (void *)myhelp);
 XScuGic_Enable(&INTCInst, 53);

}
...
void state_machine() {
}
...
int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    cleanup_platform();
    return 0;
}

Within the method initint we basically configure the GIC and enable interrupts on the ARM processor.

Also, in the code we are enabling Shared Peripheral Interrupt (SPI) #53. All USB block related interrupts will trigger via this interrupt.

We also configure so that our custom method state_machine will be called each time a SPI interrupt #53 happens. We will fill the method state_machine during the course of time.

You might realise that the method call XScuGic_Connect accepts a fourth parameter, which in this case we just passed a pointer to an integer called myhelp.

Usually for this parameter you will pass a pointer to a driver structure and when an interrupt happens, your interrupt handler (which in this case is state_machine) will receive this pointer as a parameter.

In our case we will not be using this parameter in our interrupt handler. Instead, we be implement a state machine within our interrupt handler and define a global status variable that will keep track of the current state.

One final thing needs to be done with our interrupt initialisation and this is to enable the applicable USB interrupts.

We would like to enable the General Purpose Timer Interrupt 0 (GP0). We also would like to enable Async Interrupts so that an interrupt is triggered when an Asynchronous transfer has completed.

This would be enabled as follows:

void initint() {
...
 u32 in2 = Xil_In32(0xE0002148) | (1<<24) | (1<<18);
 Xil_Out32(0xE0002148, in2); //enable
}

Scheduling Timers

During our journey to create a USB interface one of the things we will often do is to schedule a timer to wait a certain amount of time before performing the next task.

One can certainly use the sleep or usleep function provided by the SDK wrappers, but I am not so sure how accurate those are.

For the purpose of scheding timers, I am going to make use of General purpose timer 0 provided by the USB block.

This timer works in a very similar fashion as timers you find on a CIA 6526. You load a timer value into a load register, and then force this value to load into a running a timer register. The timer will count down from the predetermined value until it reaches zero and cause an interrupt.

On the Zynq, the timer load value register is located at 0xe0002080. This counter clocks at 1MHz (exactly the same as the CIA on C64). This register is 24 bits wide and can thereforebe set to up to 16 seconds.

To read the current value of the timer you need to read memory location 0xe0002084. The current timer value is present in the lower 24 bits. Bit 31 and bit 30 of this register is also of impotance for us:

Bit 31: Timer enable
Bit 30: Timer reset. When setting this bit to a one the timer will be reloaded with the value stored in location 0xe0002080

With this information, it is clear that we should setup the timer using the following steps:

Load the required timer value into 0xe0002080
Reload the timer by writing 1 to bit 30 of 0xe0002084
Start the timer by writing a 1 to bit 31 of 0xe0002084

This translates to the following method:

void scheduleTimer(int usec) {
 //set timer value
 Xil_Out32(0xE0002080, usec);
 //reload timer
 Xil_Out32(0xE0002084, 0x40000000);
 Xil_Out32(0xE0002084, 0x80000000);
}

We can take this method for a test run by making the following changes in code. Let us assume we want to wait 3 seconds:

void state_machine() {
  u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
  Xil_Out32(0xE0002144, in2); //clear

  printf("Timer finished\n");
}

int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    scheduleTimer(3000000);
    usleep(100000000)
    cleanup_platform();
    return 0;
}

I added the usleep in the main method so that our program isn't exited prematurely.

After 3 seconds the message Timer finished will be displayed on the console.

The write to location 0xe0002144 at the beginning of the state machine needs to done to ensure that the interrupt that just happened is cleared. Without this state_machine will be executed in an endless loop.

In Summary

In this post we covered some theory regarding USB.

We also started to write some for disabling data caching and enabling the appropriate USB interrupts.

In the next post we will implement functionality for resetting a USB device and configuration info from it in its default state.

Till next time!

Tuesday, 13 November 2018

Getting the cursor to flash and Keyboard Interaction

Foreword

Welcome back! It has been a while since I played around with my C64 FPGA implementation, so I first had to familiarise myself with where I was and where do I want to go 😊

In the previous post I managed to show the C64 Welcome message on the VGA screen, but without any flashing cursor.

In this post we will be implementing the flashing cursor, as well as implementing keyboard interaction.

Implementing the flashing cursor

To implement the C64 flashing cursor on our FPGA implementation, we should just ensure we interrupt our 6502 at a regular interval of 60 times per second.

To do this, we first need to implement a counter for this:

    always @(posedge clk_1_mhz)
    if (c64_reset | counter_60_hz == 0)
      counter_60_hz <= 16666;
    else
      counter_60_hz <= counter_60_hz - 1;

Here we have created a counter counting down from 16666 to zero and then reloading it with a value of 16666. Since we are clocking it with the 1 MHz signal, the counter will underflow at a rate of 60Hz, which is what we need.

We will use this clock to generate our interrupt signal:

    always @(posedge clk_1_mhz)
    if (c64_reset)
      int_occ <= 0;
    else if (counter_60_hz == 0)
      int_occ <= 1;
    else if (addr == 16'hdc0d & we == 0)               
      int_occ <= 0;

As you can see we are setting int_occ to a one when our counter reaches zero. At this stage we should mimic CIA 6526 behaviour, meaning that once an interrupt happens the interrupt status for this interrupt should remain set until cleared by software.

The interrupt status gets cleared by simply reading the interrupt status register and this is done with the else statement else if (addr == 16'hdc0d & we == 0).

Great, we can now generate an interrupt 60 times per second and all we need to do is hooking up this signal to our 6502 core:

    cpu mycpu ( clk_1_mhz, c64_reset, addr, 
                          combined_d_out, ram_in, we, int_occ, 1'b0, 1'b1 );

That is all there is to get the cursor flashing.

Let us now continue to implement keyboard interaction.

Keyboard Interaction

Within a real C64 the keys of its keyboard is arranged electrically as an 8x8 square matrix.

This 8x8 matrix in turn is hooked up to Port A and Port B of CIA#1.

Port A energises specific rows in the 8x8 matrix and Port B can see which keys within the energised row is either open or closed.

The following diagram gives an idea of how the keys is arranged within the matrix:

On the top right you can get an idea of how the keyboard connector looks like.

I am not in possession of a real C64, so we will need to make use of a USB keyboard and take the keystrokes and emulate C64 keystrokes.

So, how do we go about with this keyboard emulation? Well, firstly if you have a look at the diagram above, you will see all the keys is numbered from 0 to 63.

This gives us 64 possible keys, each one that can be either on or off. Each key can therefore be thought of as a bit.

Thinking of the memory space of our two ARM cores living on the ZYNQ, each memory location is 32 bits wide. Thus, we could fit all the possible C64 keys within two memory locations!

Is is then up to us to write a program running on the ARM processor, fetching keystrokes from the USB keyboard, and toggling the desired bits within above mentioned two memory locations to emulate the desired C64 key presses.

The previous paragraph sounds like a mouthful, so let us try to break it down a little. We need to achieve the following:

Interface with the USB keyboard and interpret keystrokes
Enable our C64 module to receive data from one of the ARM cores, also located on the ZYNQ
Emulate the C64 keypress with the data we received from one of the ARM cores.

To interface with the USB keyboard can be quite challenging task. At this point in time I don't want to elaborate too much, but towards the end of this post I will reveal a plan of action to get to a point of getting input from a USB keyboard 😀

Looking at the second point. In order for our C64 module to receive information from a ARM core, we need to engage the road of AXI Slave interfaces. We will cover this a bit a later in this post.

We will the third point, C64 keypresses emulation in the next section.

C64 Keypress emulation

Let us now continue to implement the C64 key press emulation.

We start off by adding two ports to our C64 module:

module block_test(
  input clk,
  input axi_clk_in,
  input proc_rst,
  output proc_rst_neg,
  output wire [31:0] ip2bus_mst_addr,
  output wire [11:0] ip2bus_mst_length,
  output wire [31:0] ip2bus_mstwr_d,
  output wire [4:0] ip2bus_inputs,
  input wire [5:0] ip2bus_otputs,
  input wire [31:0] slave_0_reg, 
  input wire [31:0] slave_1_reg
    );

These two ports are the two words as mentioned in the previous section, where each bit corresponds to a key on the C64 keyboard.

The following diagram gives an explanation of each bit. The number in each bit position is the relevant C64 scancode.

We will leave the connection of these two ports to the outside world for another section.

Next, we should split these 2 words into separate rows:

...
    wire [7:0] keyboard_row_0;
    wire [7:0] keyboard_row_1;
    wire [7:0] keyboard_row_2;
    wire [7:0] keyboard_row_3;
    wire [7:0] keyboard_row_4;
    wire [7:0] keyboard_row_5;
    wire [7:0] keyboard_row_6;
    wire [7:0] keyboard_row_7;     
...
    assign keyboard_row_0 = slave_0_reg[7:0];
    assign keyboard_row_1 = slave_0_reg[15:8];
    assign keyboard_row_2 = slave_0_reg[23:16];
    assign keyboard_row_3 = slave_0_reg[31:24];
    assign keyboard_row_4 = slave_1_reg[7:0];
    assign keyboard_row_5 = slave_1_reg[15:8];
    assign keyboard_row_6 = slave_1_reg[23:16];
    assign keyboard_row_7 = slave_1_reg[31:24];
...

We should now think how should emulate the keyboard behaviour. Remember, port A on CIA#1 energise the applicable row, and we read the result via port B.

So, for starters we should capture 6502 writes to port A, which is address $DC00:

...
    reg [7:0] keyboard_control_byte;
...
    always @(posedge clk_1_mhz)
    if (addr == 16'hdc00 & we)
      keyboard_control_byte <= ram_in;
...

Next, we should simulate the value for Port B, which is address $DC01:

...
    wire [7:0] keyboard_result_byte;
...
    assign keyboard_result_byte = (~keyboard_control_byte[0] ? keyboard_row_0 : 0) |           
                                  (~keyboard_control_byte[1] ? keyboard_row_1 : 0) |
                                  (~keyboard_control_byte[2] ? keyboard_row_2 : 0) |
                                  (~keyboard_control_byte[3] ? keyboard_row_3 : 0) |
                                  (~keyboard_control_byte[4] ? keyboard_row_4 : 0) |
                                  (~keyboard_control_byte[5] ? keyboard_row_5 : 0) |
                                  (~keyboard_control_byte[6] ? keyboard_row_6 : 0) |
                                  (~keyboard_control_byte[7] ? keyboard_row_7 : 0);
...
    always @*
        casex (addr_delayed)
          16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
          16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
          16'hd012: combined_d_out = line_counter;
          16'hdc01: combined_d_out = ~keyboard_result_byte;
          default: combined_d_out = ram_out;
        endcase
...

I might be worthwhile to mention that we are working here with active when high logic, because it makes live easier. You might recall though that the C64 works with active low logic.

So to work between these two worlds, we negate the value received from port A, and when you send the calculated value back to port B we negate it again.

Connecting to the outside World

With the changes perfoemd to our C64 module, we need a way to interface to an ARM core. This is where we need to work again with AXI's.

In previous posts we worked a couple of times with AXI's. The AXI's we worked with previously were all AXI Masters.

We defined an AXI Master for writing frames produced by our VIC-II to SDRAM. We also defined an AXI Master for reading back these frames from SDRAM by our VGA module and generating a VGA signal for displaying these frames on screen.

An AXI Master can be seen as a source for generating memory requests.

In our case where we need to receive data from an ARM core, we need something the opposite, which is receiving memory orders. An AXI peripheral receiving memory orders, is called an AXI Slave.

You can create an AXI block which contain both an AXI slave and an AXI master. The following is an example:

This is within our existing design.

Marked in green is the new AXI slave port called S00_AXI and in red is our existing AXI Master port called M00_AXI.

You will also see that I have also hooked the two slave port as indicated also in green. To enable these two ports on the AXI block, I had to some custom code changes which I will cover now.

With our AXI open in IP Packager, scroll to the user port section and change it as follows:

 // Users to add ports here
        input wire [31:0] ip2bus_mst_addr,
        input wire [11:0] ip2bus_mst_length,
        input wire [31:0] ip2bus_mstwr_d,
        input wire [4:0] ip2bus_inputs,
        output wire [5:0] ip2bus_otputs,
        output wire [31:0] slave_reg_0,
        output wire [31:0] slave_reg_1,
 // User ports ends

One of the things you will realise when you configure an AXI module to have an AXI slave interface, is that an AXI slave module will automatically be created and an instance be created within the top module. The instance within the top module will look something like the following:

 myip_burst_test_v1_0_S00_AXI # ( 
  .C_S_AXI_DATA_WIDTH(C_S00_AXI_DATA_WIDTH),
  .C_S_AXI_ADDR_WIDTH(C_S00_AXI_ADDR_WIDTH)
 ) myip_burst_test_v1_0_S00_AXI_inst (
  .S_AXI_ACLK(s00_axi_aclk),
  .S_AXI_ARESETN(s00_axi_aresetn),
  .S_AXI_AWADDR(s00_axi_awaddr),
  .S_AXI_AWPROT(s00_axi_awprot),
  .S_AXI_AWVALID(s00_axi_awvalid),
  .S_AXI_AWREADY(s00_axi_awready),
  .S_AXI_WDATA(s00_axi_wdata),
  .S_AXI_WSTRB(s00_axi_wstrb),
  .S_AXI_WVALID(s00_axi_wvalid),
  .S_AXI_WREADY(s00_axi_wready),
  .S_AXI_BRESP(s00_axi_bresp),
  .S_AXI_BVALID(s00_axi_bvalid),
  .S_AXI_BREADY(s00_axi_bready),
  .S_AXI_ARADDR(s00_axi_araddr),
  .S_AXI_ARPROT(s00_axi_arprot),
  .S_AXI_ARVALID(s00_axi_arvalid),
  .S_AXI_ARREADY(s00_axi_arready),
  .S_AXI_RDATA(s00_axi_rdata),
  .S_AXI_RRESP(s00_axi_rresp),
  .S_AXI_RVALID(s00_axi_rvalid),
  .S_AXI_RREADY(s00_axi_rready),
 );

Let us now have a look at the code for this module, looking only at the interesting parts of the code, though.

You will see that there is a couple of slave registers defined within this module:

...
 reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg0;
 reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg1;
 reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg2;
 reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg3;
...

These registers basically forms the heart of the AXI slave interface. In effect it is these registers that will map at a specific address in address space, and if one of the ARM cores to a write to this address range, the contents of the write will end off in one of these slave registers.

It is the content of these registers which we want to propogate to our C64 module to inform it which key was pressed. More on this later.

The following snippet is also interesting:

 always @( posedge S_AXI_ACLK )
 begin
   if ( S_AXI_ARESETN == 1'b0 )
     begin
       slv_reg0 <= 0;
       slv_reg1 <= 0;
       slv_reg2 <= 0;
       slv_reg3 <= 0;
     end 
   else begin
     if (slv_reg_wren)
       begin
         case ( axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
           2'h0:
             for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
               if ( S_AXI_WSTRB[byte_index] == 1 ) begin
                 // Respective byte enables are asserted as per write strobes 
                 // Slave register 0
                 slv_reg0[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
               end  
           2'h1:
             for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
               if ( S_AXI_WSTRB[byte_index] == 1 ) begin
                 // Respective byte enables are asserted as per write strobes 
                 // Slave register 1
                 slv_reg1[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
               end  
           2'h2:
             for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
               if ( S_AXI_WSTRB[byte_index] == 1 ) begin
                 // Respective byte enables are asserted as per write strobes 
                 // Slave register 2
                 slv_reg2[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
               end  
           2'h3:
             for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
               if ( S_AXI_WSTRB[byte_index] == 1 ) begin
                 // Respective byte enables are asserted as per write strobes 
                 // Slave register 3
                 slv_reg3[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
               end  
           default : begin
                       slv_reg0 <= slv_reg0;
                       slv_reg1 <= slv_reg1;
                       slv_reg2 <= slv_reg2;
                       slv_reg3 <= slv_reg3;
                     end
         endcase
       end
   end
 end

This code starts off by saying that at a reset, all slave registers is been initialised to a zero.

During a write operation the applicable slave register gets written in the case statement.

What remains to be done is to surface the contents of the first two slave registers as two output ports in this module:

...
 // Users to add ports here
        output wire [31:0] slave_reg_0,
        output wire [31:0] slave_reg_1,
 // User ports ends
...
 // Add user logic here
    assign slave_reg_0 = slv_reg0;
    assign slave_reg_1 = slv_reg1;

 // User logic ends
...

These ports should then be connected all the way to the top module, which we can then connect to our C64 module.

With the ports added and everything hooked up, you will see in the address editor a section in address space is reserved for this slave interface:

With this address map, when have a program running on the ARM core and it writes to either address 0x43c0_0000 or 0x43c0_0004, the content will arrive at our C64 module at the two slave ports.

The Test Program

With our block design completed, we need to write a small C program that will run on one of the ARM cores to test the design.

This test program should basically set one of the bits in the two slave registers to trigger a simulated keypress.

For the program, the following main method will do:

...
int main()
{
    init_platform();
    Xil_Out32(0x43c00000,0x100);
    return 0;
}
...

This program sets bit 8 of the slave register. Bit 8 in this register corresponds to scancode 8, and should therefore type a '3' on the C64 screen.

A Test Run

Time to do a quick Test Run. The following video shows the result:

The video starts off with a flashing cursor and shortly afterwards a '3' gets printed when the program executes.

It works!

In Summary

In this post we managed to implement the flashing cursor as well as implementing key press simulation.

Well, obviously to make our live easier it would be nice to capture keystrokes from a real keyboard, which in our case would be an USB keyboard.

This is where things can really get interesting.

Firstly, we can make our life easy by installing PetaLinux on our Zybo board. This is a version of Linux and have drivers that will take care of all USB communications and detecting keyboard strokes for us.

When running the Zybo board in Standalone mode, you cannot make use of these USB/keyboard drivers and you will need to develop something yourself.

This is where my Hacker instinct starts to kick in and the eagerness to learn how stuff works that we all takes for granted.

This is an excellent opportunity to learn how USB works, so I thought of dedicating a couple of posts on developing a stripped down USB protocol stack.

So, in the next post I will spend some time on a bit of theory on how USB communications work and then take it from there.

Till next time!