Tuesday, 9 April 2019

Creating the Tape Interface on the Zybo Board

Foreword

In the previous post we managed to play sound on the Zybo Board.

In this post we will focus again to develop cassette interface been driven by a .TAP file stored in main memory.

Once developed, we will test this interface by playing the output of this interface to speakers.

High Level Overview

Let us start by looking on a high level off what we want to achieve. The following block diagram describe what we want to achieve in a nutshell:


We start off by having a .TAP file stored in the SDRAM of the ZYBO board. The contents of this file gets transferred word by word via the AXI protocol to our FPGA logic.

You might recall from previous posts that whenever we access SDRAM via AXI, we are are making use of two modules that we have developed earlier on.

The first module we use to connect to one of the AXI ports of the ZYNQ processor. In this block we also make use of an AXI Burst block which is an IP provided by Xilinx. The AXI Burst basically abstract the technical details of the AXI protocol and provide us with a set of signals that is easier to work with.

The second module we take the simplified set of signals provided by the AXI Burst Block and store the stream of datawords within a FIFO. The FIFO basically buffer the information received from the AXI port and absorb the bursty nature of the AXI protocol. Thus, on the receiving end the FIFO you will get the datawords at a constant rate.

You will also recall from previous posts that previous mentioned FIFO should have sufficient depth to avoid underflow. In our cassette interface, however, it is sufficient to store only a single word at a time making a FIFO a bit of a overkill.

Why don't we need a FIFO for the cassette interface? That is because we are receiving data words on the AXI bus at 100MHz whereas we will be producing a pulsating data signal with a maximum frequency of about 3KHz. That means that between toggling pulses we will have more than enough time to fetch the next sample from SDRAM.

Let us now refer back to our high level block diagram. Our block used for storing a word of AXI data at a time is the READ WORD block.

We receive data from AXI 32 bits a time, whereas with .TAP file data it is easier for us to inspect data a byte at a time. It is for this reason that we have implemented a BIT SLICER block, that breaks up a word into its individual bytes. This functionality is implemented with a shift register shifting eight bits at a time.

You will see that apart from the data signal between the READ WORD and BIT SLICER block, we have two extra signals: Valid and ACK. This is a pattern you will see quite often in an pipeline architecture.

When the READ WORD block have received  a piece of data fro the AXI port, it informs the Byte Slicer by asserting the Valid line. With assertion of this line the Byte stores the data and asserts the ACK line. This in turns informs the READ WORD block that it can go ahead and retrieve the next word from the AXI port.

In this way both the READ WORD and BYTE SLICER is kept busy. This almost remind us of a assembly line in a factory.

One more thing I want to highlight between the READ WORD and BYTE SLICER is the dotted line with the caption Cross Clock Domain. This is to highlight that on the left side of the dotted line we are working at the AXI clock frequency of 100MHZ. on the right hand side we are working at only 1MHZ. To cater for these different clocks, we will again use milti-flop synchronisers in both blocks.

Next, let us have a look at the Sample Assembler block. If you read through the specification for a .TAP file you will see that each pulse width value will be one byte or four bytes. The rule is simple: If the byte value is zero, the next three bytes will give the absolute pulse width in microseconds. If the byte value is non-zero, the pulse with is contained only within a single byte.

It it thus the purpose of the Sample Assembler to determine the duration of the next pulse width with the stream of incoming bytes.

Our final block is the PWM block, which a lot of you will recognise as the acronym for Pulse Width Modulation. PWM actually describes the data signal you receive from a Commodore Datasette: A set of pulses of varying length.

Our PWM block is basically implemented as a countdown timer, toggling its output each time when a underflow condition has occurred.

You will also see that the output of the PWM block is fed back to the the Sample assembler. The Sample assembler uses the pulse transition to a low as a cue to start assembling the next sample pulse duration.

Implementing the READ WORD block

Let us have a look at the code for the READ WORD block.

I will start by showing the complete block of code for this module and then highlighting important snippets from it:

module read_word(
  input wire clk,
  input wire restart,
  input wire reset,
  output reg [12:0] count_in_buf,
  input ack,
  output wire [31:0] ip2bus_mst_addr,
  output reg [11:0] ip2bus_mst_length,
  input wire [31:0] ip2bus_mstrd_d,
  output wire [31:0] axi_d_out,
  output wire [31:0] data_wire_out,
  output wire [4:0] ip2bus_inputs,
  input wire [5:0] ip2bus_otputs,
  output wire empty,
  input wire read,
  output reset_1_mhz,
  output data_valid
    );

reg master_read_dst_rdy; //change to axi name
wire cmd_ack; // change to axi name
wire mstread_req;
wire mst_type;
reg  [31:0] axi_start_address;
reg  [31:0] data_cap;
reg [31:0] reset_1_counter = 50000000;
wire [11:0] burst_len;
(* ASYNC_REG = "TRUE" *) reg sync_ack_0, sync_ack_1, sync_ack_2;
wire master_read_src_rdy;
reg [12:0] bytes_to_receive;
reg [3:0] state;
reg axi_data_loaded = 0;
reg [12:0] axi_data_inc;
wire neg_clk;
wire pos_edge_ack;

assign data_valid = axi_data_loaded;
assign pos_edge_ack = !sync_ack_2 & sync_ack_1;
assign data_wire_out = {data_cap[7:0], data_cap[15:8], data_cap[23:16], data_cap[31:24]};
assign reset_1_mhz = reset_1_counter > 21000000 ? 1 : 0;

parameter
  IDLE = 4'h0,
  INIT_CMD = 4'h1,
  START = 4'h2,
  ACT = 4'h3,
  TRANSMITTING = 4'h4;

parameter BURST_THRES = 124;  

assign neg_clk = ~clk;

always @(posedge clk)
if (reset_1_counter > 20000000)
  reset_1_counter <= reset_1_counter - 1;

always @(posedge clk)
begin
  sync_ack_0 <= ack;
  sync_ack_1 <= sync_ack_0;
  sync_ack_2 <= sync_ack_1;
end

always @(posedge clk)
 if (restart | pos_edge_ack | reset)
  axi_data_loaded <= 0; 
 else if ((state > START) & !master_read_src_rdy & !axi_data_loaded) 
   axi_data_loaded <= 1;
    
always @(posedge clk)
  if (!master_read_src_rdy & !axi_data_loaded)
    data_cap <= ip2bus_mstrd_d;

always @(posedge clk)
if ((reset | restart) & !axi_data_loaded & state == 0)  
  state <= 0;
else
  case( state )
    IDLE: if (!axi_data_loaded) 
            state <= INIT_CMD;
    INIT_CMD: state <= START;             
    START: if (cmd_ack)
             state <= ACT;
    ACT: if (!master_read_src_rdy)
             state <= TRANSMITTING;
    TRANSMITTING: state <= IDLE;    
  
  endcase  
  
always @(negedge clk)
if (restart | reset)
begin
  axi_start_address <= 32'h200000;
  axi_data_inc <= 0;
end
else if (state == INIT_CMD)
begin
  axi_start_address <= axi_start_address + axi_data_inc;
  axi_data_inc <= 4;
end    

always @(negedge clk)
if (state == INIT_CMD)
  ip2bus_mst_length <= 4; 
  
assign mstread_req = (state == START) ? 1 : 0;

assign mst_type = (state == START) ? 1 : 0;

always @*
  if (state == START)
    master_read_dst_rdy = 0;
  else if (state > START & !axi_data_loaded)
    master_read_dst_rdy = 0;
  else
   master_read_dst_rdy = 1;
         
assign master_read_src_rdy = ip2bus_otputs[3];
assign cmd_ack = ip2bus_otputs[0];
assign ip2bus_inputs[0] = mstread_req;
assign ip2bus_inputs[1] = mst_type; 
assign ip2bus_mst_addr = axi_start_address;
assign ip2bus_inputs[2] = master_read_dst_rdy;

assign ip2bus_inputs[3] = 1'b0;
assign ip2bus_inputs[4] = 1'b0;
endmodule



Firstly this code contains some glue logic for interfacing with the AXI Burst block. There is also some reset logic and restart logic. Restart logic is important if you reload SDRAM with a new .TAP file.

We will be receiving the ACK signal from the Byte Slicer, which is another clock domain. For this reason we are defining three synchroniser flip-flops sync_ack_0, sync_ack_1 and sync_ack_2. We are also using these flip-flops to determine the positive edge of the ACK signal, which we use to trigger the loading of the next word from the AXI port.

Bit Slicer

Let us have a look at the implementation for the bit slicer:

module byteslicer(
  input clk,
  input data_valid,
  output [7:0] byte_out,
  output ack,
  input [31:0] data_in,
  input restart,
  input read
    );
    
parameter STATE_INIT = 0;
parameter STATE_LOADED = 1;
parameter STATE_SHIFT_1 = 2;
parameter STATE_SHIFT_2 = 3;
parameter STATE_SHIFT_3 = 4;
    
reg [3:0] state = 0;
reg [31:0] data_reg;
(* ASYNC_REG = "TRUE" *) reg data_valid_0, data_valid_1;

assign ack = state == STATE_INIT & data_valid_1;
assign byte_out = data_reg[31:0];

always @(posedge clk)
begin
  data_valid_0 <= data_valid;
  data_valid_1 <= data_valid_0;
end

always @(posedge clk)
if (state == STATE_INIT & data_valid_1)
  data_reg <= data_in;
else if ((state == STATE_LOADED | state == STATE_SHIFT_1 | STATE_SHIFT_2 | STATE_SHIFT_3) & read)
  data_reg <= {data_reg[23:0],8'h0};

always @(posedge clk)
if (restart)
  state <= STATE_INIT;
else case (state)
  STATE_INIT: state <= data_valid_1 ? STATE_LOADED : STATE_INIT;
  STATE_LOADED: state <= read ? STATE_SHIFT_1 : STATE_LOADED;
  STATE_SHIFT_1: state <= read ? STATE_SHIFT_2 : STATE_SHIFT_1;
  STATE_SHIFT_2: state <= read ? STATE_SHIFT_3 : STATE_SHIFT_2;
  STATE_SHIFT_3: state <= read ? STATE_INIT : STATE_SHIFT_3;
endcase

endmodule


In this module we have again the scenario where we receive a signal from another clock domain, which in this case is data_valid. For this reason we are creating creating the synchronisers data_valid_0 and data_valid_1.

As mentioned earlier on, the bit slicer is shift register shifting eight bits at a time. In our implementation, the shift happens while the read line is asserted, which will be driven by the sample assembler when it needs more bytes.

Sample Assembler

The implementation for the sampler assembler is as follows:

module sample_assembler(
  input clk,
  input data_valid,
  input [7:0] data,
  output ack,
  input pwm,
  output reg [23:0] timer_val,
  output tape_out,
  input restart
    );
    
parameter STATE_START = 0;
parameter STATE_LOADED = 1;
parameter STATE_LOADED_1 = 2;
parameter STATE_LOADED_2 = 3;
parameter STATE_LOADED_3 = 4;


reg [3:0] state = 0;
reg pwm_0, pwm_1;
reg three_byte_sample = 0;
wire neg_edge;

assign tape_out = pwm;
assign neg_edge = !pwm_0 & pwm_1;

always @(posedge clk)
begin
  pwm_0 <= pwm;
  pwm_1 <= pwm_0;
end

assign ack = state == STATE_START | (state == STATE_LOADED_1 & data_valid) | (state == STATE_LOADED_2 & data_valid) | (state == STATE_LOADED_3 & data_valid);

always @(posedge clk)
if (state == STATE_START & data_valid & data != 0)
  timer_val <= {data, 3'b0};
else if ((state == STATE_LOADED_1 | state == STATE_LOADED_2 | state == STATE_LOADED_3) & data_valid)
  timer_val <= {data, timer_val[23:8]};

always @(posedge clk)
if (restart)
  state <= STATE_START;
else case(state)
  STATE_START: begin
                 three_byte_sample <= 0;
                 if (data_valid & data != 0) 
                   state <= STATE_LOADED;
                 else if (data_valid)
                   state <= STATE_LOADED_1;
               end
  STATE_LOADED_1: if (data_valid)
                  begin
                    //three_byte_sample <= 1;             
                    state <= STATE_LOADED_2;
                  end     
    //state <= data_valid ? STATE_LOADED : STATE_START;
  STATE_LOADED_2: if (data_valid)
                   state <= STATE_LOADED_3;
                       
  STATE_LOADED_3: if (data_valid)
                   state <= STATE_LOADED;    

  
  STATE_LOADED: begin
    state <= neg_edge ? STATE_START : STATE_LOADED;
    three_byte_sample <= 0;
  end
endcase
endmodule


The Sample Assembler starts off by inspecting the first byte that comes in. If it is non-zero, it gets padded with three zeros and we have a sample value.

If the first byte is a zero, the Sample Assembler waits patiently for the next three bytes to be clocked in to get the full sample value.

Once a sample value is created, we wait for a negative clock transition from the PWM to restart the process.

PWM

Let us have a look at our final module. Here is the implementation:

module tape_pwm(
  input [23:0] time_val,
  input load_timer,
  output pwm,
  input clk
    );
    
  reg polarity = 1;
  reg [23:0] load = 100;
  reg [23:0] timer = 100;
  assign pwm = polarity;
    
  always @(posedge clk)
    if (load_timer)
      load <= {1'b0,time_val[23:1]};
  
  always @(posedge clk)
  if (timer > 0)
    timer <= timer - 1;
  else
    timer <= load;
    
  always @(posedge clk)
    if (timer == 0)
      polarity <= ~polarity;
endmodule


As mentioned this is just a countdown timer of which toggles the output on underflow.

You will also realise that when storing the timer val in the load register we discard the lower bit. This is because the timer values in a .TAP file is the period between two positive transitions. So, we need to toggle the pulse at a period of half this value.

Linking everything up

With all the modules created it is a matter of linking everything up.

A port that need special mention is the restart port you get on most of these modules. This port needs to be assigned to a AXI slave port so the ZYNQ processor can access it.  You can then toggle this bit programatically when you have loaded a .TAP file into SDRAM.

A .TAP file can be loaded into SDRAM by making use of the XSCT command mwr (memory write).

With all the modules linked up you can test the design by integrating with the sound system we developed in the previous post. The produced sound should sound similar as when you play a C64 Tape on a Tape deck.

In Summary

In this post we have developed the cassette interface that will take a .TAP file and produce a corresponding signal of variable pulse widths.

In the next post we will start to integrate this cassette interface to our C64 module.

Till next time!



Monday, 18 February 2019

Creating Sound on the Zybo board

Foreword

In the previous post we start going down the alley of Tape Emulation and ended off writing some Python code for converting a .TAP file to sound.

The sound that we generate is basically a set of pulses of varying widths. Outputting this pulse widths as sound is a quick sanity check if we implemented tape emulation more or less correctly.

Our next goal is to see if we can implement this sound generation from a .TAP file in real time within the Zybo boards FPGA.

Playing the generated sound on the Zybo is perhaps the most complex part of the exercise, so I have decided to dedicate this post to Zybo sound generation.

Sound on the Zybo board

One of the nice features of the Zybo board is that it supports onboard sound. The Zybo board simply cannot hide away this feature because of the familiar color coded Line In/Out/Mic ports:


These ports are all hooked up to a Audio Codec chip from Analogue devices: The SSM2603.

This Audio Codec have two ports that hooks up to the ZYNQ SoC: an I2C port and an I2S port. The I2C port is used to configure the Audio Codec like sample rate and volume control.

The I2S port is used to transmit digital audio data between the ZYNQ and Audio Codec.

Both the I2S port and I2C port is linked to pins on the ZYNQ of which only the FPGA has access to.

Configuring the Audio Codec

As mentioned in the previous section, configuation of the audio codec is done via a I2C port.

Implementing a I2C port in an FPGA can be a daunting task, and one will be pleased to learn that the Zynq have two I2C onchip peripherials.

Shortly after discovering this, one might feel someone burst your bubble by discovering that the I2C port of the Audio codec is hooked up to pins that onchip peripherals don't have direct access to.

But fear not! The Zynq allows you to configure the ports of onchip peripherals to be redirected via EMIO. This basically means that you can make these ports available to the FPGA. Within the FPGA you can then either decide to hook up these ports directly to the output pins or you can join hook the ports to custom logic blocks.

The following block diagram within Vivado shows how this is achieved:



I have marked in red the ports of an onchip peripheral that I have exposed to the FPGA.

These ports I have hooked up to two instances of a custom logic block iobuf. This logic block is basically an implementation of a tristate buffer.

We can now proceed and write some code to initialise the Audio Codec. Firstly we need to initialise the onchip i2c peripheral we are going to use:

int main()
{
...
    Xil_Out32(0xe000501c, 0x1f);
//Set divider + addressing mode
    Xil_Out32(0xE0005000, 0x9004);
//master -> ACK -> CLR FIFO -> hold bus
    Xil_Out32(0xE0005000, 0x9004 + 2 + 8);
...

}

I have added a bit of comments on what is going on during initialisation, but I am not going to go into too much detail here. More details is provided in the Zynq Technical reference manual in Chapter 20: I2C Controller as well as Appendix B, in the register details for the I2C controller.

Let us now write some methods to read and write to the registers of Audio Codec:

...
int readReg(int addr) {
 //master -> ACK -> CLR FIFO -> hold bus
     u32 in2 = Xil_In32(0xE0005000) | 64 | 16;
     in2 = in2 & ~1;
     Xil_Out32(0xE0005000, in2);
     //write data to register
         Xil_Out32(0xE000500c, addr << 1);
     //write address
         Xil_Out32(0xE0005008, 26);
     // Wait for completion
         u32 status = Xil_In32(0xe0005010) & 1;
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);

         //clear interrupts
         Xil_Out32(0xe0005010, 1);

         //set hold bus -> read -> clear fifo
         in2 = Xil_In32(0xe0005000) | 16 | 1 | 64;
         Xil_Out32(0xe0005000, in2);
         //set transfer size
         Xil_Out32(0xe0005014, 2);
         //set address
         Xil_Out32(0xe0005008, 26);
         //clear hold
         in2 = Xil_In32(0xe0005000) & (~16);
         Xil_Out32(0xe0005000, in2);
         //wait for completion
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);
         Xil_Out32(0xe0005010, 1);
         u32 byte0 = Xil_In32(0xe000500c);
         u32 byte1 = Xil_In32(0xe000500c);
         return byte0 | (byte1 << 8);

}
...
void writeReg(int addr, int data) {
 //master -> ACK -> CLR FIFO -> hold bus
     u32 in2 = Xil_In32(0xE0005000) | 64 | 16;
     in2 = in2 & ~1;
     Xil_Out32(0xE0005000, in2);
     //write data to register
         Xil_Out32(0xE000500c, (addr << 1) | ((data & 256) ? 1 : 0));
         Xil_Out32(0xE000500c, data & 255);
     //write address
         Xil_Out32(0xE0005008, 26);
     // Wait for completion
         u32 status = Xil_In32(0xe0005010) & 1;
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);

         //clear interrupts
         Xil_Out32(0xe0005010, 1);

         in2 = Xil_In32(0xe0005000) & (~16);
         Xil_Out32(0xe0005000, in2);
       return;
}
...

Again, here is lot of things going on here and can be best understood with the Zynq Technical Reference Manual. Here it is also handy to have the Datasheet for the SSM2603 Audio Codec available to understand the format required for setting and reading registers.

We can now continue and write some code for initialising the Audio Codec:

int main()
{
...
    writeReg(15,0);
    usleep(1000);
    writeReg(6, 16 + 32 + 64);
    writeReg(2, 0b101111001);
    writeReg(3, 0b101111001);
    writeReg(4, 0);
    writeReg(5, 0);
    writeReg(7, 1);
    writeReg(8, 0);
    usleep(1000);
    writeReg(9, 1);
    usleep(1000);
    writeReg(6, 32);
    usleep(1000);
    writeReg(4,16+6);

...
}

Let me give a quick run down what is happening here.

The first write to register 15 forces the Audio Codec to write default values to all registers.

The write to register 6 powers up all blocks within the Audio Codec accept the Out Block. According to the datasheet we can only enable the out block later in the initialisation process.

The writes to registers 2 and 3 sets the volume of the left and right DAC.

Next, let us skip straight to the write to register 7. This write informs the format of the samples that will be presented to the I2S bus, which in this case is 16 bit samples that is left justified.

With the write to register 8 we are setting the actual sample rate, which is 48KHz.

With the write to register 9 we are enabling the digital core. Note that it is preceeded by a small delay. According to the datasheet a short delay should be allowed after all blocks are powered up.

With the write to register 6 we are finally powering up the Out block and with the write to register 4 we are enabling the DAC.

You will also see that between the write to register 6 and the write to register 4 I have also added as small delay. Nowhere in the datasheet it is specified that it is necessary to do this. However, with trail and error i have found that if you do not add this delay you can do whatever you want, you will not get any sound output to the speaker.

This concludes the configuration of the Audio Codec. In the next section we will discuss how to implement the I2S interface.

Implementing the I2S interface

To implement a I2S interface is much simpler than a I2C interface.

To start off let us have a look at a I2S timing diagram from the Audio Codec datasheet:

Within the datasheet you will see other timing diagrams for other Input modes, but we will only be focusing on Left-Justified mode.

A signal not present in the above diagram is MCLK (e.g. Master Clock) which is 256 times the sampling rate.

Back to the diagram. The first waveform (RECLRC/PBLRC), indicates for which channel the current sample is applicable for.

The BCLK generates a pulse for each bit of data. In our case where we have 16 bits per channel, the frequency will be 32 times the sample rate.

Lastly we have the signal RECDAT/PBDAT that is the actual sample data.

All three signals together with MCLK should all be in sync to avoid data corruption. We will see in a moment how this is done.

Now let us calculate the frequencies for the different clocks.

As mentioned earlier on MCLK is 256 times the sample rate. Thus MLCK should be 12.288MHz.

BCLK is 32 times the sample rate and therefore is the frequency 1.536MHz.

We will generate the 12.288MHz clock with a clock wizard within Vivado. The resulting clock we will need to forward externally from the Zynq to the Audio Codec. Xilinx recommends not to forward a generated clock directly to an output pin, but rather to make use of an ODDR component. The following module definition will take care of this:

module oddr_buf(
  output Mlck_O,
  input clk_in
    );

   ODDR #(
      .DDR_CLK_EDGE("OPPOSITE_EDGE"), // "OPPOSITE_EDGE" or "SAME_EDGE" 
      .INIT(1'b0),    // Initial value of Q: 1'b0 or 1'b1
      .SRTYPE("SYNC") // Set/Reset type: "SYNC" or "ASYNC" 
   ) ODDR_inst (
      .Q(Mlck_O),   // 1-bit DDR output
      .C(clk_in),   // 1-bit clock input
      .CE(1), // 1-bit clock enable input
      .D1(1), // 1-bit data input (positive edge)
      .D2(0), // 1-bit data input (negative edge)
      .R(0),   // 1-bit reset
      .S(0)    // 1-bit set
   );

endmodule


We pass the generated clock to clk_in. The output port Mlck_O is the signal we should assign to an output pin.

Now onto the generation of the rest of the I2S signals. We start by creating an empty module with the required ports:

module i2s(
  input clk,
  output clk_1_5_mhz,
  output channel_enable,
  output out_data,
    );

endmodule

For the input port clk we pass the generated 12.288MHz signal. clk_1_5_mhz is our generated bclk signal.

channel_enable is the channel indicator and out_data the actual sample data.

Let us write some code to generate the bclk signal:

...
reg [1:0] clk_div_counter = 0;
reg bclk_int = 0;
...
    always @(posedge clk)
    if (clk_div_counter == 3)
      bclk_int <= ~bclk_int;

    always @(posedge clk)
        clk_div_counter <= clk_div_counter + 1; 
...

So, the bclk clock is generated from the MCLK by means of a clock divider.

Both the remaining signals transition on the negative edge of BCLK, so us quickly create a wire signalling this behaviour:

...
    wire neg_edge;
...
    assign neg_edge = (clk_div_counter == 3) & (bclk_int == 1) ? 1 : 0;
...

Next, let us write code for the channel indicator:

...
    assign channel_enable = prclk_int;
...
    reg [3:0] channel_enable_counter = 0;
...
    always @(posedge clk)
    if (neg_edge)
      channel_enable_counter <= channel_enable_counter + 1;

    always @(posedge clk)
    if (neg_edge & channel_enable_counter == 15)
      prclk_int <= ~prclk_int;
...

And now let us write some code for out data:

...
    reg [31:0] shift_reg;
...
    assign out_data = shift_reg[31];
...
    always @(posedge clk)
    if (channel_enable_counter == 15 & neg_edge)
    begin
      shift_reg <= {data_val, data_val};
    end
    else if (neg_edge)
      shift_reg <= {shift_reg[30:0] , 1'b0};
...

As you can see, we have implemented a shift register for shifting out the sample values, which we reload each time the channel indicator signal toggles.

data_val is the actual sample value, which we haven't defined yet. For this we are going to define something very simple, which will be a monotone with a frequency between 2000Hz and 3000Hz. For this we can just alternate the sample value between 30000 and 0 every 6th sample:

...
    reg [15:0] data_val = 0;
...
    always @(posedge clk)
    if (channel_enable_counter == 15 & neg_edge)
    begin
      if (sample_mod_counter == 0)
      begin
        sample_mod_counter <= 6;
        data_val <= (data_val == 0) ? 30000 : 0;        
      end
      else
        sample_mod_counter <= sample_mod_counter - 1;
    end
...

What only remains is to link up the external pins to our audio codec:


This is all there is for creating sound on the Zybo board, which in this case will be a monotone

In Summary

In this post we played around with sound on the Zybo board and managed to generate a monotone.

This exercise will aid us in the next post to create a cassette interface and verify the design by listening to the produced pulses.

This post will also come in handy in future posts where we implement SID emulation.

Till next time!

Thursday, 24 January 2019

Focusing on Tape Integration

Foreword

In the previous post we managed to interface our C64 FPGA module with a USB keyboard.

In this post we will start to focus on tape integration to our C64 module. Well, not exactly interfacing with a 1530 Datassette, but simulating the tape loading process from a .TAP file.

While pondering in this alley, we might just relive the nostalgia a couple of decades ago where we all played a C64 cassette on a normal sound system to hear what it sounds like. For this exercise we will see if we can take a .TAP file and see if we can reproduce similar sounds, with the help of Python on a PC.

Once we have successfully reproduced the sound of a C64 tape, we will set forth and see if we can do the same on the Zybo board, with the logic implemented within the FPGA.

I will not be covering all the above mentioned in this post, but rather in several ones, working incrementally towards a solution where we have a fully integrated tape to C64 module solution.

The .TAP file format

Let us start by looking at the .TAP file format. For this exercise let us have a look at a snippet of a .TAP file:


The file header starts with a textual description C64-TAPE-RAW. The actual file data starts at offset 0x14.

The file data basically a set of pulse widths. In general a pulse width is represented by one byte. Multiply this value by 8, and you have the pulse width in terms of 1MHz pulses.

Let us have a look at our example snippet. Starting at offset 0x14, we see a series of 30's. Converting this number to decimal and multiplying by 8, we get 384. This gives us a period of 0.000384s.

From this period we can calculate the frequency from the equation f = 1/T. This gives us a frequency of 2604Hz. This is the monotone you hear for the first 10 seconds or so from a C64 tape.

Converting a TAP file to sound

With the information from the previous section, let us see if we can take a .TAP file and and generate the sound as we remember it a couple of decades ago.

For this exercise we will be using Python to generate the raw samples. Not many programs can play raw samples, but Audacity can play it.

Within Python we start off by opening the TAP file and moving to the byte position where the actual data starts:

import struct
f = open('Dan Dare.tap', 'rb')
resfile = open('file.dat', 'wb')
timei = 0
f.seek(20)

timei is the current time in millionths of a second. I will show in a moment how this variable gets updated.

The whole sound sample generation is driven by the following loop:

...
while timei < 240000000:
...

This loop will generate 4 minutes worth of sound samples.

Within the loop we start off by reading a pulse width:

...
while timei < 240000000:
  timeval = ord(struct.unpack('c', f.read(1))[0])
...

One thing I didn't mention earlier on is that a pulse byte value of zero is a special exception. A pulse byte value of zero means that an absolute time period value is to follow in the next three bytes. With this information in mind, we add the following code to our loop:

while timei < 240000000:
  timeval = ord(struct.unpack('c', f.read(1))[0])
  if timeval == 0:
    byte1 = ord(struct.unpack('c', f.read(1))[0])
    byte2 = ord(struct.unpack('c', f.read(1))[0])
    byte3 = ord(struct.unpack('c', f.read(1))[0])
    timeval = (byte3 << 16) + (byte2 << 8) + byte1
  else:
    timeval = timeval << 3


So, in this part we cater for both the zero byte time values and for other case.

We now have a physical time value, and hence we can update timei:

...
while timei < 240000000:
...
  timei = timei + timeval
...

We now have enough information for generating the sound samples. Keep in mind that each time period is broken down in two halves. In the first half our pulse have a positive value and in the second half our pulse have a negative value. For this reason it makes sense to work with half of the time period value, obtained by shifting the time value right by one bit:

...
while timei < 240000000:
...
  timeval = timeval >> 1


We would like to create sound samples at a rate of 48KHz, giving us the following code:

...
  timeval48khzfloat = float(timeval) * 48000/1000000
  timeval48khzint = int(timeval48khzfloat)
  for x in range (timeval48khzint):
    resfile.write(struct.pack('h',32000))
  for x in range (timeval48khzint):
    resfile.write(struct.pack('h',-32000))
...

This code will generate the sound samples for us. Finally, we just need to close the file when we are done:

resfile.close()

Listening to the result

I took the samples and converted it to a mp3 with the help of Audacity.

Unfortunately, since I use Blogger for hosting my posts, there is not a easy way to embed sound clips within posts. So I had to create a video from the mp3 and upload it to Youtube so everyone can listen to the end result.

It is perhaps advisable to tune down on the volume when listening to this, since there is some tones that can be annoying to the ear:

It sounds more or less as I remember it when I listened a couple of decades ago on a tape deck to C64 tape. Perhaps the leading mono-tone sounds too pure compared to the tape player of the day.

In Summary

In this post we have started to investigate how to integrate tape loading functionality to our C64 module.

As a nostalgic exercise, we attempted to reproduce the sound of a .TAP file as we remember it long time ago.

I performed this exercise on a PC with Python and Audacity.

It would be interesting to see if this exercise can be performed on a Zybo board, taking the .TAP file and generating the sound samples in real time within the FPGA and outputting the sound to a speaker, via the Line Out on the Zybo board.

My goal of generating sound from the .TAP file on a Zybo board perhaps sounds a bit over the top and unnecessary, but it can be an opportunity to learn how to use sound on the Zybo board. This knowledge be valuable if we later decide to also incorporate a SID within our C64 module.

So, in the next post we will attempt to generate sound on the Zybo board.

Till next time! 

Monday, 17 December 2018

Redirecting USB keystrokes to C64

Foreword

In the previous post we managed to catch the scan codes of keys pressed on a USB keyboard.

In this post we will be redirecting these keystrokes to our C64 module so we can have some meaningful interaction with our C64 module.

The Plan of Action

Let us start by refreshing our minds a bit.

A couple of posts ago we implemented two slave registers which we mapped into memory space at locations 43c0_0000 and 43c0_0004.

Combining these two slave registers we have 64 bits in which each of these bits represents a key on the C64 keyboards. The ARM can toggle the bits in these registers and in effect simulate key presses within our C64 module.

All it will take from us is to take the USB keyboard scan codes we receive from the keyboard, and converting it to C64 key scan codes and we have a working implementation.

Starting simple

Let us start by implementing a mapping between USB and C64 keyboard for just four keys: A, B, C, D.

The USB scan codes for these keys are as follows:


  • A -> 4
  • B -> 5
  • C -> 6
  • D -> 7
The corresponding scan codes for these keys on a C64 is as follows:

  • A -> 0xa
  • B -> 0x1c
  • C -> 0x14
  • D -> 0x12
We can create a quick mapping function for these keys:

u32 mapUsbToC64(int usbCode) {
 if (usbCode == 0x4) {
  return 0xa;
 } else if (usbCode == 0x5) {
  return 0x1c;
 } else if (usbCode == 0x6) {
  return 0x14;
 } else if (usbCode == 0x7) {
  return 0x12;
 }
}


We will invoke this method within our state_machine method where are printing the USB scancodes to the console:

...
  if (!(Xil_In32(qTDAddressCheck + 8) & 0x80)) {
   u32 word0 = Xil_In32(0x305000);
   u32 word1 = Xil_In32(0x305004);
   if (word0 == 0)
    Xil_Out32(0x43c00000, 0);
   else {
    u32 bit = mapUsbToC64((word0 >> 16) & 0xff);
    bit = 1 << bit;
    Xil_Out32(0x43c00000, bit);
   }
...

Here we set the corresponding bit in the slave register according to the returned c64 scan code.


Here is a demonstration of our code in action:



Our mapUsbToC64 can now just be extended to cover the other keys.

As it stands, our current implementation only support the first 32 c64 scancodes. So let us quickly do some changes to cover the full 64 scancodes:

...
  if (!(Xil_In32(qTDAddressCheck + 8) & 0x80)) {
   u32 word0 = Xil_In32(0x305000);
   u32 word1 = Xil_In32(0x305004);
   if (word0 == 0) {
    Xil_Out32(0x43c00000, 0);
    Xil_Out32(0x43c00004, 0);
   } else {
    u32 bit = mapUsbToC64((word0 >> 16) & 0xff);
    //bit = 1 << bit;
    u32 c64Word0 = 0;
    u32 c64Word1 = 0;
    if (bit < 32) {
     c64Word0 = 1 << bit;
    } else {
     c64Word1 = 1 << (bit - 32);
    }

    Xil_Out32(0x43c00000, c64Word0);
    Xil_Out32(0x43c00004, c64Word1);
   }
   printf("%x %x\n",word0, word1);
...

So, if the scancode is less than 32 we set the appropriate bit at address 0x43c0_0000. For scancodes bigger than 32 we set the appropriate bit at address 0x43c0_0004.

Implementing simultaneous key presses

Up to this point in time we are only able to deal with one key press at a time. This becomes an issue when we want to type double quotes (") on the C64, which require pressing the shift and the 2 key simultaneously.

In this section we will deal with simultaneous key presses.

Luckily from the USB side we are provided with enough information to determine if more than one key is pressed simultaneously. Each byte from the 8 bytes returned in the USB report descriptor represent a key that is currently been pressed. The exception to the rule is modifier keys, like Shift and Control. The status of all the modifier keys is contained within a single byte, where is bit corresponds to a modifier key.

We start off by creating a method we are sending the 8 USB bytes and returning the values we need to assign to addresses 0x43c0_0000 and 0x43c0_0004 respectively:

void getC64Words(u32 usbWord0, u32 usbWord1, u32 *c64Word0, u32 *c64Word1) {

}

We implement two loops for looping through both USB words:

void getC64Words(u32 usbWord0, u32 usbWord1, u32 *c64Word0, u32 *c64Word1) {
  *c64Word0 = 0;
  *c64Word1 = 0;

  usbWord0 = usbWord0 >> 16;

  for (int i = 0; i < 2; i++) {
   int current = usbWord0 & 0xff;
   if (current != 0) {
     int scanCode = mapUsbToC64(current);
        if (scanCode < 32) {
     *c64Word0 = *c64Word0 | (1 << scanCode);
     } else {
     *c64Word1 = *c64Word1 | (1 << (scanCode - 32));
     }

   }

   usbWord0 = usbWord0 >> 8;
  }

  for (int i = 0; i < 4; i++) {
   int current = usbWord1 & 0xff;
   if (current != 0) {
     int scanCode = mapUsbToC64(current);
        if (scanCode < 32) {
     *c64Word0 = *c64Word0 | (1 << scanCode);
     } else {
     *c64Word1 = *c64Word1 | (1 << (scanCode - 32));
     }

   }

   usbWord1 = usbWord1 >> 8;
  }

}


You will see that for the first USB word we are discarding the first two bytes. This is because the first byte is the byte mask for the modifier keys and the second byte is reserved.

Talking about the modifier keys. It would be nice to implement the shift key in order to type the double quotation (") in our C64 module. So let us do that quickly:

void getC64Words(u32 usbWord0, u32 usbWord1, u32 *c64Word0, u32 *c64Word1) {
  *c64Word0 = 0;
  *c64Word1 = 0;

  if (usbWord0 & 2) {
   *c64Word0 = 0x8000;
  }
...
}

In the USB report the left shift key is bit 2 of the modifier byte. So this is why we are masking off this bit.

Let us now do a test run. In the following video I write a very simple basic program and run it:


This conclude this post.

In Summary

In this post we integrated the USB keyboard with our C64 module.

We then tested everything by writing a very simple basic program and running it.

Till next time!

Tuesday, 4 December 2018

Catching keystrokes from a USB keyboard

Foreword

In the previous post we managed to read a couple of descriptors from a USB keyboard and identified which endpoint to use for capturing the keystrokes from the keyboard.

In this post we will develop some code for actually retrieving the keystrokes from the keyboard.

Moving to the configured state

For the majority of the previous post we lingered within the default state. Just to refresh our minds again of the other states for a USB device, let us look again at the following diagram:


As you can see, after the Default state there is still two states, Address and Configured, we need to go through until we can do something useful with the USB device.

Let us start by having a look at the Address state. In this state we assign an address to our USB device so so that it stops listening at the default address (e.g. 0).

To set the address, we need to make a bit of changes to our state_machine method:

void state_machine() {
 //bit 24 bit 18
 u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
 Xil_Out32(0xE0002144, in2); //clear

 if (status == 0) {
  set_port_reset_state(1);
  scheduleTimer(12000);
  status = 1;
  return;
 } else if (status == 1) {
  set_port_reset_state(0);
  status = 2;

  //set address
  Xil_Out32(0x301000, 0x00030500);
  Xil_Out32(0x301004, 0x00000000);
  schedTransfer(1,0,0x0, 0x300000);
  return;
 } else if (status == 2) {
  scheduleTimer(3000);
  status = 3;

  return;

 }
}

Here we are setting up a request type 5, which is SET_ADDRESS, and we are setting the device to address 3. We end by waiting 3 milliseconds just make sure everything settled down before we continue.

Let us now configure the device. For his we implement an extra status in our if-else block:

else if (status == 3) {
  in2 = Xil_In32(0x300004) | 3;
  Xil_Out32(0x300004, in2);

  //set configuration
  Xil_Out32(0x301000, 0x00010900);
  Xil_Out32(0x301004, 0x00000000);
  schedTransfer(1,0, 0, 0x300000);
  status = 4;
        return;
 } 

You will see that we adjust the device address in our first address to 3, because the address was changed in a previous state.

Next we select the appropriate configuration. In the previous post we determined that we should select configuration number #1.

At this stage our USB device is fully configured and ready to use.

A brief pause at periodic schedules

With our USB keyboard ready to use, the next thing obviously for us is to read the keystrokes.

My first take on reading these keystrokes was to also implement an asynchronous schedule. However, with this approach I didn't had any luck at all. Things worked better for me using Periodic schedules.

So, in this section let us spend some time discussing in more detail how periodic schedules work.

Firstly, let us look again at the diagram of how periodic schedules work:

From the diagram we see that everything is driven of a periodic frame list been referenced in part by a FrameIndex, which is updated at the end of each USB frame.

A USB Frame is basically a time period of 1 millisecond.

If you look into further detail on when the FrameIndex gets updated, you will see that strictly speaking the frameindex isn't updated every millisecond, but every 1/8 millisecond. Furthermore you will see that the bottom 3 bits of the FrameIndex is not used to index the Periodic frame list, but rather from bit 3 upwards of the frameindex.

At this point you my be wondering why the frameindex get incremented every 1/8 of a millsecond if the rest of the system only work in increments of 1 millisecond.

The answer is to maintain a bit of compatibility between USB 1.1 and USB 2.0. USB 1.1 always had frames of 1 millisecond in duration. USB 2.0 introduced the concept of microframes, breaking a framedown into even smaller durations of 1/8 milliseconds.

But, despite my explanation, how can you access 1/8 millisecond frames if the frame index, for all basic reasons, only gets incremented every 1 milliseconds? The key to this questions lies in the lower 8 bits of word 2 in a QH.

From the EHCI spec these 8 bits is referred to as the Interrupt schedule mask. Every bit in this byte correspond to a specific microframe within the frame. A one in any particular position means that the transaction will take place within the particular microframe.

If only one bit is set within the Interrupts schedule mask, only one transaction will execute within the frame. Similarly, if more than one bit is set, more than one transaction will trigger within the frame.

Let us now talk a bit about the data structures a Period Frame List points to. A Periodic Frame List also points to QH/qtd structures as an Asynchronous List does.

In fact, it is very convenient to think of each element in a Periodic Frame List as an Asynchronous list on its own. In this analogy, each element of the Periodic Frame List can be thought of as an ASYNCLISTADDR-register on its own.

There is , however, a small flaw in this analogy. In an Asynchronous schedule the ASYNCLISTADDR-register gets updated during traversal to always point to the next QH in the list. In a periodic schedule, however, each element in the Periodic Frame List always point to the first QH element in the list.

As such, within a periodic schedule a circular QH list doesn't make sense. 

Configuring the Periodic Schedule

Let us now write some code for scheduling the periodic schedule.

Firstly we need to specify the number of elements of our Periodic Frame List. We want to poll once every 16 milliseconds. Since each element have duration of 1 millisecond, it makes sense to have sixteen elements with only one of these elements pointing to a valid QH.

To set the frame list size we make use of three bits of register 0xe0002140: 15, 3 & 2.  These three bits gets grouped together as [15][3][2] and has the following meaning:

  • 000: List size is 1024 elements
  • 001: List size is 512 elements
  • 010: List size is 256 elements
  • 011: List size is 128 elements
  • 100: List size is 64 elements
  • 101: List size is 32 elements
  • 110: List size is 16 elements
  • 111: List size is 8 elements
From the above list we should use the value 110 which corresponds to the following code:

void setup_periodic() {
  u32 in2 = Xil_In32(0xE0002140) | (1<<15) | 8;
  Xil_Out32(0xE0002140, in2);

}

Our List will reside at address 0x304000, so we initialise this area and set the Periodic address base register:

void setup_periodic() {
...
 Xil_Out32(0x304000, 1);
 Xil_Out32(0x304004, 1);
 Xil_Out32(0x304008, 1);
 Xil_Out32(0x30400c, 1);
 Xil_Out32(0x304010, 1);
 Xil_Out32(0x304014, 1);
 Xil_Out32(0x304018, 1);
 Xil_Out32(0x30401c, 1);
 Xil_Out32(0x304020, 1);
 Xil_Out32(0x304024, 1);
 Xil_Out32(0x304028, 1);
 Xil_Out32(0x30402c, 1);
 Xil_Out32(0x304030, 1);
 Xil_Out32(0x304034, 1);
 Xil_Out32(0x304038, 1);
 Xil_Out32(0x30403c, 1);

 Xil_Out32(0xE0002154, 0x304000);
}

We start off the initialisation by setting all pointers to invalid pointers. We then set the one of these pointers to a valid one:

void setup_periodic() {
...
 struct QStruct *qh;
 qh = 0x204040;
 qh->word0 = 0x304082;
 qh->word1 = 0;
 qh->word2 = 0;
 qh->word3 = 0;
 qh->word4 = 1;
 qh->word5 = 1;

 qh = 0x204080;
 qh->word0 = 1;
 qh->word1 = 0x00085103; 
 qh->word2 = 0x40000001;
 qh->word3 = 0;
 qh->word4 = 0x204100;
 qh->word5 = 1;

 struct QStruct *qTD;
 qTD = 0x204100;
 qTD->word0 = 1; 
 qTD->word1 = 1; 
 qTD->word2 = 0x00080180; 
 qTD->word3 = 0x305000;

 //set first frame to qh
 Xil_Out32(0x304000, 0x304042);
}

You might find it a bit strange that we start with a QH that doesn't contain any qTD's at all, followed by a QH that does have them. I will explain the reasoning behind this a bit later on.

You will also see that the NAK count reload field for the second QH is zero. You might recall that for our asynchronous Schedule this was always 15. Why the difference?

To answer this question let us first look at what a NAK packet is.

When a USB host request data from a USB device and the device doesn't have any data available it will respond with a NAK packet. Sometimes you would like to throw an error if a certain number of NAK packets is received in a row. This is the purpose of the NAK reload field.

In our case we just would like to ignore these packets all together, so we set the RL field to zero. In our schedule when a NAK packet is encountered the slot will just be ignored and be moved on to the next slot.

What is left to be done is to enable the periodic schedule by adding another state within state_machine:

 } else if (status == 5) {
  //enable periodic scheduling
  setup_periodic();
  in2 = Xil_In32(0xE0002140) | 16;
  Xil_Out32(0xE0002140, in2);
  status = 6;
  scheduleTimer(10000);
  return;
 }

As can be seen, we schedule a wait of 10 milliseconds before we transition the next state.

Reading the actual keystrokes

Let us now write some code for capturing keystrokes from the USB keyboard.

The basic idea is to display the keycode each time a key is pressed or released.

In the previous post we have set up the periodic scheduled with a scheduled qTD transfer in one slot.

We should poll this qTD datastructure till the transfer is finished, which happens when bit 7 of word 2 change to 0. We implement this functionality with an extra state:

 } else if (status == 6) {
  if (!(Xil_In32(0x304100 + 8) & 0x80)) {
   u32 word0 = Xil_In32(0x305000);
   u32 word1 = Xil_In32(0x305004);
   printf("%x %x\n",word0, word1);
  }

  scheduleTimer(10000);
  return;
 }

We are polling the qTD datastructure at every 10 milliseconds. Once transfer is finished we will get the keystroke information at the first eight bytes at location 0x305000.

Once the scheduled transfer is finished, a new transfer would not be automatically scheduled. It is up to you to schedule a new one.

One could probably just reset the values in the qTD to restart a new transfer. Doing this we may end off with a potential cache coherency issue. To explain, look at the following block diagram of the USB block below:


Changes you made to QH and qTD structures are written to System Memory. The DMA block within the USB controller reads these reads these changes from Main Memory from time to time into internal Dual-Port RAM.

One cannot tell at which stage the USB Controller is reading from System memory and half baked qTD datastructures might end up into the Dual port RAM.

The solution to this issue is to not modify these structures but to create new structures:

 } else if (status == 6) {
  u32 qTDAddress = currentTD ? 0x304100 : 0x304120;
  u32 qTDAddressCheck = currentTD ? 0x304120 : 0x304100;

  if (!(Xil_In32(qTDAddressCheck + 8) & 0x80)) {
   u32 word0 = Xil_In32(0x305000);
   u32 word1 = Xil_In32(0x305004);
   printf("%x %x\n",word0, word1);
   struct QStruct *qh;
   qh = 0x304040;
   qh->word0 = 1;

   struct QStruct *qh2;
   qh2 = currentTD ? 0x304080 : 0x3040c0;

   qh2->word0 = 1;
   qh2->word1 = 0x00085103; 
   qh2->word2 = 0x40000001;
   qh2->word3 = 0;
   qh2->word4 = qTDAddress;
   qh2->word5 = 1;

   struct QStruct *qTD;
   qTD = qTDAddress;
   qTD->word0 = 1; 
   qTD->word1 = 1; 
   qTD->word2 = 0x00080180;
   qTD->word3 = 0x305000;
            u32 temp = qh2;
            temp = temp | 2;
   qh->word0 = temp;

   currentTD = ~currentTD;
  }

  scheduleTimer(10000);
  return;
 }

As you see, this is where our QH comes in which contains no qTD's. Once we have created a new QH and qTD we just change the next pointer of the first mentioned QH.

One thing to also keep in mind when a transfer is complete, is to preserve the Data toggle bit and apply it to the new qTD. This is done as follows:

...
  u32 toggle = Xil_In32(qTDAddressCheck+8) & 0x80000000;
  if (!(Xil_In32(qTDAddressCheck + 8) & 0x80)) {
...
   qTD->word2 = 0x00080180 | toggle;
...
                }
...

All the developed code should now be sufficient to capturing the keystrokes continuously and outputting to the console.

The meaning of USB keycodes

As mentioned in the previous section, each key press will result in 8 bytes been populated at address 0x305000. However only the last 6 bytes is significant to us.

Each value of these six bytes represent a keycode of a key that is currently pressed. This means that up to 6 keys can be pressed simultaneously.

USB key scan codes is a bit different than your convential PS/2 codes in that it is more predictable. For instance have a look at the USB scan codes for the first couple of alphabet letters:


  • Key A: scancode 4
  • Key B: scancode 5
  • Key C: scancode 6
  • Key D: scancode 7
  • Key D: scan code 8
  • Key E: scancode 9
etc.

In summary

In this post we implemented some code for catching keystrokes from the USB keyboard.

In the next post we will integrate the USB keyboard with our C64 module.

Till next time!


Friday, 30 November 2018

Resetting a USB device and reading Config Data

Foreword

In the previous post we discussed a bit of theory surrounding USB communications and started to implement some interrupts from the USB functional block in the Zynq.

In this post we will get a bit more practical and see how to reset a USB device and to read configuration information from it.

In this post we will not be implementing functionality to detect when a device is plugged or unplugged, in order to keep things simple. We will thus assume a USB keyboard is attached on the USB port when we start up.

The Life Cycle of a USB Device

To get a bit of context of for this post, let us look at the life cycle of a USB device. This is illustrated by the following state diagram (taken from the official USB 2.0 specification):


The attached state is the state when you have just attached a USB device into the USB port.

Provided the USB host enabled a voltage between the VCC pin and GND pin on the USB port, the USB port will power up and enter the Powered state shortly after attachment.

It should be noted, though, that when a USB Host was just powered up, no voltage will be present on the VCC+GND pins. It is up to you to configure the USB Hub so that power is enabled over these two pins.

Once a USB is in the powered state, it still will not respond to any Host commands over the ports. You first need to apply a reset over the USB port so that the device enter the Default state.

When a USB device is in the default state, it will respond to traffic on device address 0 and on endpoint 0.

It should be noted that the USB device will not stay in the default state for long, probably for a couple of tens of milliseconds, at most. It is up to you to get the device to the address state as soon as possible.

At the addressed state the USB will be assigned a non zero-address and all subsequent communication will be directly to this new address.

For the USB device to become fully functional it needs to transition to the configured state.

For the purpose of this post, we will just be moving to the default state and requesting a device descriptor and requesting a configuration descriptor. More on these descriptors later on.

Switching Port power on and resetting device

Let us get to writing some code.

First thing we should do, is to switch the USB module to Host mode. For this we need to use the lower three bits of register 0xe00021a8. The function of these three bits is defined as follows:


  • 00 (default): Idle
  • 01: resrerved
  • 10: Controller device mode
  • 11: Controller in host mode
This corresponds to the following code:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
}

int main()
{
      Xil_DCacheDisable();
      init_platform();
      initint();
      initUsb();
      usleep(100000000);
      cleanup_platform();
      return 0;
}


Next, we should switch on the port power. Bit 12 of register 0xE0002184 performs this task for us. So let us extend our method initUsb:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
 u32 in2 = Xil_In32(0xE0002184) | 4096;
 Xil_Out32(0xE0002184, in2); //switch port power on
}


The code above will bring our USB device into the power state. Next we need to reset the device to bring it into the Default state. Bit 8 of register 0xe0002184 is used to initiate port reset. So, let us create the following method to assert reset and to de-assert the reset:

void set_port_reset_state(int do_reset) {
  u32 in2;
  if (do_reset) {
 in2 = Xil_In32(0xE0002184) | 256;
 Xil_Out32(0xE0002184, in2);
  } else {
 in2 = Xil_In32(0xE0002184) & (~256);
 Xil_Out32(0xE0002184, in2);
  }

}


Now, if one read through the USB 2.0 specification, it looks like we need to allow at least 12ms for USB device to come out of reset. Here we will make use of the state_machine method we defined in the previous post to assist in scheduling the 12ms delay:

...
void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear

   if (status == 0) {
    set_port_reset_state(1);
    scheduleTimer(12000);
    status = 1;
    return;
   } else if (status == 1) {
    set_port_reset_state(0);
    status = 2;
   } else if (status == 2) {
    printf("\n");
   }


}
...
int main()
{
     Xil_DCacheDisable();
     init_platform();
     initint();
     initUsb();
     status = 0;
     state_machine();  
     usleep(100000000);
     cleanup_platform();
     return 0;
}


A you might remember from the previous post, we are using the state_machine method as a callback when an interrupt happens within the USB block of the Zynq. Here, however we are also calling it from the main method. We do this just as the initial state for our state machine.

In this initial state we assert the port reset and schedule the timer for 12ms. After the 12ms an interrupts will trigger and the state_machine method will be called again. This time around we will de-assert the port reset. At this stage our USB keyboard should be in the default state listening for USB traffic on address 0.

It is in this default state we can read the device descriptor and the configuration descriptor from the USB device.

In order to read these descriptors from the device, we need to schedule an asynchronous schedule, which we briefly touched on in the previous post. To schedule this, we need to know more about the following datastructures: Q Head (QH) and q Transfer descriptors (qTD). We will discuss this in the next section

Q Heads and Transfer descriptors

Let us have a look at the QH and qTD data structures.

First the QH data structure. This structure is discussed within the Zynq TRM on page 463. The following table summarises this data-structure:

The first word in the structure is a pointer to the next QH. For our Asynchronous schedule, the first QH will just be pointing to itself.

The second word have a couple of fields of importance:

  • RL (NAK counter reload): For our case we will just use a value of 15
  • C (Control endpoint flag). Set this field to a one if it is a non High Speed, control endpoint. We will indeed set this field to one in our case.
  • Maximum Packet length: We will be setting this field to 8.
  • H (Head of reclamation list). Set this value to one, since we be having one, and one only QH
  • DTC (Data toggle control). Set to one
  • EPS (End Point Speed):
    • 00: Full Speed
    • 01: Low Speed (What we will be using)
    • 10: High Speed
    • 11: Reserved
  • EndPt (End point address): Since we will be using using the Control Endpoint, this value will be set to zero
  • I: Set to zero
  • Device Address: Since we will be operating when the device is in the default state, we will use device address zero
For the third word, I will not going into detail. We will just be using the value 0x40000000.

You will see that the remaining words is coloured in grey and according to the legend this means Host Controller Read/Write. We will leave all these zero, accept for Next qTD Pointer, in which we will specify the first qTD.

Let us now move unto the qTD structure. This structure is discussed on page 459 in the Zynq TRM. The following table summarise this data-structure:


The first word is a pointer to the next qTD structure.

The third word contains the following fields:
  • DT: Data toggle
  • Total Bytes: To Bytes to receive from or send to USB device
  • IOC: Cause an interrupt when this transfer is finished
  • C_Page (Current Page): Index to the current buffer (e.g. 0 to 4)
  • Cerr (Error counter)
  • PID (PID Code): More on this in the following section
    • 00 Out
    • 01 In
    • 02 Setup
  • Status
  • Buffer Pointers 0 to 4: Four pointers, each of whicg points to a 4KB buffer. This contains the data received from or to send to the USB device that is assoaiated with this transfer descriptor.

Data Transfers from USB devices

One of the data-structures we covered in the previous section was transfer descriptors. A Transfer descriptor is what its name implies, that is to transfer data to or from the USB device.

According to the USB 2.0 specification, you get a couple of different types of Transfer, but in this post we will only focusing on one type: Control Transfers. The following web page does quite a good job of explaining control transfers, together with some diagrams:

https://www.beyondlogic.org/usbnutshell/usb4.shtml#Control

This web page basically states that a Control Transfer can be broken down into a couple stages. To narrow down the number of scenarios, I am just going to focus on one particular use case: Getting a device descriptor from the USB device via the Control endpoint.

With this use case in mind, let us have a look at the different stages for a Control transfer.

Setup stage

The following diagram taken from previously mentioned web page, explains the setup setup stage:

The setup stage starts by issuing a setup token to the USB device. This indicates to the USB device what kind of data is about to follow.

Next follows a data packet. In our use case where want to request a device descriptor, this packet would contain this request to the USB device as such.

The USB would acknowledge the whole request with a ACK packet, indiacted by the white block in the diagram.

This whole stage would be taken care of by a transfer descriptor as discussed in the previous section. Interestingly, for this stage a PID would specified.

The data packet for this request would be contained in a buffer pointed to buffer pointer 0 contained in word 3 of the relevant qTD.

The Data Stage

The following diagram explain the Data stage:

This diagram from Beyondlogic outlines two scenarios. The first scenario is when we expect data back from the USB device and the second scenario is is we are required to send data to the USB device after the setup phase.

For our Use case we are only interested in the first scenario. For this we need need to setup a qTD with a PID of one. The Buffer pointer in this qTD will be a pointer to the buffer that will receive the data from the USB device during the Data Packet Phase.

Don't worry about the Handshake packet for now. This will be covered in the next phase.

The Status Stage

With our Use case we are receiving data from the USB Device, so it is up to us to acknowledge this data during the status stage. The following diagram illustrates this:


For this stage we need to create a qTD with a PID of zero. Since we will be sending a data packet of zero length, we don't need to specify a valid buffer pointer in this qTD.

Initialising the Async queue

Now with a bit of theory behind, let us write some again. This time we will initialise the async queue.

Since our application is a bare-metal application, we will not be making use of malloc calls to allocate memory for our data structures. Instead, we will use some specific memory locations for our data-structures.

We start off by clearing the memory region we will be using for our data structures:

void initUsb() {
 Xil_Out32(0xE00021A8, 3);//set to host mode
 u32 in2 = Xil_In32(0xE0002184) | 4096;
 Xil_Out32(0xE0002184, in2); //switch port power on

 for (int i = 0; i < 1000000; i = i + 4) {
  u32 current = 0x300000 + i;
  u32 *currentword;
  hello = currentword;
  *currentword = 0;
 }
}


This will clear 1 Million bytes worth of words to zero starting at address 0x300000.

Next, we should setup a QH and a couple of qTD's. To assist us with this, we first need to create a helper data structure, making it easy to navigate through the 4 byte word nature of these data-structures:

struct QStruct {
  u32 word0;
  u32 word1;
  u32 word2;
  u32 word3;
  u32 word4;
  u32 word5;
  u32 word6;
  u32 word7;
};


We can now continue and create a QH:

void initUsb() {
...
 struct QStruct *qh;
 qh = 0x300000;
 qh->word0 = 0x300002;
 qh->word1 = 0xf808d000; //enable H bit -> head of reclamation
 qh->word2 = 0x40000000;
 qh->word3 = 0;
 qh->word4 = 0x300040;// pointer to halt qtd
 qh->word5 = 1;// no alternate

}


This QH starts at memory location 0x300000. The next pointer points back to itself (e.g. the first word).

You will also release that this pointer ends with 2 instead of zero. This is because bit 1 and 2 actually represents the head type of the pointer, which in this case is a QH.

Word 4 is a pointer to the first qTD of this QH, which starts at address 0x300040.

Let us now have a look at the qTD:

void initUsb() {
...
 struct QStruct *qTD;
 qTD = 0x300040;
 qTD->word0 = 1; //next qtd + terminate
 qTD->word1 = 0; // alternate pointer
 qTD->word2 = 0x40; //halt value// setup packet 80 to activate
}


Word one has the value 1, menaing there is not a valid next qTD.

For word2 we specify a value of 0x40. This create us a async schedule in the halt state. We can now enable the async schedule:

void initUsb() {
...
 Xil_Out32(0xE0002158,0x300000); // set async base
 in2 = Xil_In32(0xE0002140) | 0x1;
 Xil_Out32(0xE0002140,in2); //enable rs bit
 in2 = Xil_In32(0xE0002140) | 0x20;
 Xil_Out32(0xE0002140,in2); // enable async processing

}


The async schedule is started by setting bit 5 of register 0xe0002140. Once enabled, the scheduler looks at register 0xe0002158 as the location for the first QH.

As mentioned, this async is now in the halt state. We need to add additional qTD's to make this schedule do something useful.

We will cover this in the next section.

Setting up a Transfer

Int he previous section we managed to enable an async, although not a very useful one: everything is in the halted state!

Let us start by creating a method for enabling a useful transfer:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {

}

Setup specify whether we should add a setup qTD.

Direction specify whether we want to receive or send data. For receiving direction should be a 1.

Size is the number of bytes we want to send or receive.

qh_add is the address of the QH at which we want to add the qTD's.

If we require a setup token, we convert the halt qTD to a setup qTD:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {
   struct QStruct *qh;
   qh = qh_add;
   u32 first_qtd = qh->word4;
   struct QStruct *firstTD;
   struct QStruct *nextTD;
   firstTD = first_qtd;
   nextTD = first_qtd;
   if (setup) {
     firstTD->word0 = calNextPointer(first_qtd); //next qtd + terminate
     firstTD->word1 = 1; // alternate pointer
     firstTD->word2 = 0x00080240; //with setup keep haleted/non active till everything setup
     firstTD->word3 = 0x301000; //buffer for setup command

    }
}

You will see that the lower eight bits of word2 is still 0x40. This means that our queue will remain in the halt state till we change it another value.

Next, we should add the remaining qTD's:

void schedTransfer(int setup, int direction, int size, u32 qh_add) {
...
    if (size > 0) {
       if (setup)
         nextTD = calNextPointer(first_qtd);

       nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
       nextTD->word1 = 1; // alternate pointer

       nextTD->word2 = (size << 16) | (direction << 8) | (nextTD == firstTD ? 0x40 : 0x80) | 0x80000000;


       if (direction == 0)
   nextTD->word2 = nextTD->word2 | 0x8000;
       nextTD->word3 = setup ? 0x302000 : 0x301000; //buffer for setup command

       nextTD = calNextPointer(nextTD);

       if (direction == 1) {
   nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
   nextTD->word1 = 1; // alternate pointer
   nextTD->word2 = 0x80008080; //with setup keep haleted/non active till everything setup
   nextTD->word3 = 0x301000; //buffer for setup command
       }
    } else {
     //size = 0
     nextTD = calNextPointer(first_qtd);
        nextTD->word0 = calNextPointer(nextTD); //next qtd + terminate
        nextTD->word1 = 1; // alternate pointer
        nextTD->word2 = (0 << 16) | (1 << 8) | (nextTD == firstTD ? 0x40 : 0x80) | 0x80000000 | 0x8000;

    }
    if (nextTD == firstTD)
     nextTD->word2 = nextTD->word2 | 0x8000;
 nextTD = calNextPointer(nextTD);
 nextTD->word0 = 1; //next qtd + terminate
 nextTD->word1 = 1; // alternate pointer
 nextTD->word2 = 0x40; //with setup keep haleted/non active till everything setup
 nextTD->word3 = 0x301000; //buffer for setup command
}

The last qTD is again a halt qTD.

This code is also written as such so that the last qtd executed creates an interrupt.

Once all the qTD's has been setup, we can mark the first qTD in the sequence as runnable:

  firstTD->word2 = (firstTD->word2 & (~0x40)) | 0x80;

One final method that should be implemented is calNextPointer:

u32 calNextPointer(u32 currentpointer) {
 currentpointer = currentpointer - 0x300040;
 currentpointer = currentpointer + 0x20;
 if (currentpointer > 0x200)
  currentpointer = 0;
 return currentpointer + (u32)0x300040;
}


This method advances to the next address and return to 0x300040 after a couple of advances, in effect simulating a circular buffer.

Reading a descriptor from USB Device

With the method created in the previous section, we can now use it to read a descriptor from the USB device.

To get the descriptor we need to schedule the transfer with the command request stored in a buffer. The USB 2.0 spec give us an indication on how this request should look like (on page 250):


Let us have a look at the values. We start with bmRequestType with value 0x80.

For bmRequest we need to use the constant GET_DESCRIPTOR. To get this value scroll down to the next page of the USB 2.0 spec and you will see the value is 6.

Descriptor Type is retrieved from the next table and have value 1 for Descriptor type DEVICE.

Wlength has value 0x12.

We can now modify our state_machine method to send a DEVICE_DESCRIPTOR request:

void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear

   if (status == 0) {
    set_port_reset_state(1);
    scheduleTimer(12000);
    status = 1;
    return;
   } else if (status == 1) {
    set_port_reset_state(0);
    status = 2;
    //device descriptor
                  Xil_Out32(0x301000, 0x01000680);
                  Xil_Out32(0x301004, 0x00120000);
                  schedTransfer(1,1,0x12, 0x300000);
   } else if (status == 2) {
    printf("\n");
   }
}


As seen, we are writing the request to the buffer at address 0x301000.

The descriptor returned by the USB device will be stored at location 0x302000. The else statement for status 2 can be used to print the contents of this buffer. Let us have a look at the contents of the buffer:

302000:   01100112
302004:   08000000
302008:   0C231A2C
30200C:   02010110
302010:   00000100
302014:   00000000

Since bytes are stored with an ARM core is little endian, for each word you should read the bytes from right to left.

So, we start off with 0x12 which is the length of the Descriptor.

0x1 is the Descriptor typem which in this case is DEVICE.

The next two bytes indicates the USB version which in this case is 1.1.

The next couple of bytes gives information of the Device class which is zero for three bytes. This means more info about the device is provided in the Configuration descriptor.

Following that is the maximum packet size which is 8 bytes.

Then there is a couple of vendor, product versions.

The last number of the descriptor is a 1 meaning that there is only one possible configuration.

That concludes our discussion on getting and reading the DEVICE descriptor.

Reading the Configuration Descriptor

Time for us to read the configuration descriptor. For this wee need to modify our state_machine method again:

void state_machine() {
   u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
   Xil_Out32(0xE0002144, in2); //clear
   if (status == 0) {
     set_port_reset_state(1);
     scheduleTimer(12000);
     status = 1;
     return;
   } else if (status == 1) {
     set_port_reset_state(0);
     status = 2;
     //configuration descriptor
     Xil_Out32(0x301000, 0x02000680);
     Xil_Out32(0x301004, 0x003b0000);
     schedTransfer(1,1,0x3b, 0x300000);
   } else if (status == 2) {
     printf("\n");
   }
}


This time our we get back 0x3b bytes. In effect this data contains a number of descriptors where each one start with the number of bytes:

09 02 3B 00 02 01 00 A0 32
09 04 00 00 01 03 01 01 00
09 21 10 01 00 01 22 36 00
07 05 81 03 08 00 0A
09 04 01 00 01 03 00 00 00
09 21 10 01 00 01 22 32 00
07 05 82 03 08 00 0A

Let us discuss these descriptors.

The first descriptor:

  • Descriptor type: 2 -> Configuration descriptor
  • 0x3b -> overall length of all desriptors
  • 02 -> number of interfaces
  • 01 -> value to select this configuration
  • 00 -> string index for textual description. In this case none available
  • 0xA0 -> couple of attributes
  • 0x32 -> max power in 2mA units. In this case 100mA

From the configuration descriptor we see that two interfaces are defined. Each interface has the descriptor type number 4. Further more, byte 5 of this interface descriptor specify the interface type. For both these interfaces this is type 3, which is an HID(Human interface device).

The next two bytes of interest of the Interface descriptors is byte 6 and 7. For the first interface descriptor these bytes are 1 and 1, whereas for the second one it is 0 and 0.

For the first interface these two bytes corresponds to the following:

  • Boot interface Subclass
  • Keyboard
This type of Interface is a simplified keyboard interface for BIOS's and we will indeed use this Interface for our design.

Let us now go and have a look at the endpoint for this Interface.

07 05 81 03 08 00 0A

0x81 specify that this is an IN endpoint and the address of this endpoint is 1.

The 3 specifies that this in an interrupt endpoint.

The 8 specifies that the maximum packet size for this endpoint is 8 bytes.

The 0x0A specify the polling interval in milliseconds. Thus in this case the polling interval is 10 milliseconds.

In Summary

In this post we managed to read a couple of descriptors from a USB keyboard and isolated an endpoint that we can use to read keystrokes from the keyboard.

In the next post we will attempt to read keystrokes from the USB keyboard.

Till next time!

Friday, 23 November 2018

Getting started with USB Protocols

Foreword

In the previous post we managed to implement the flashing cursor and keyboard interaction.

At this point in time a keypress can only be simulated by running a program on the ARM core that writes a value to a specific register and we are not yet at a point of integrating a physical keyboard to our C64 system.

The goal I have in mind is to integrate with a USB keyboard. To do this there is an easy way and a difficult way.

The easy way is way is to make use of PetaLinux, a Linux distribution provided by Xilinx for the Zynq processor. Going this route will give you some USB and keyboard drivers simplifying our integration with a USB keyboard, avoiding to worry about the technicalities of the USB protocol.

Then there is the difficult route, trying to access the USB keyboard in Standalone mode. In standalone mode you cannot make use of the drivers that comes bundled with Linux, and you sort of need to re-invent the wheel for USB keyboard interaction.

Re-inventing the wheel is not really cool, but it gave me some second thoughts. I have been using USB devices for almost 18 years without knowing  how the communication work between the PC and a USB device.

Going the difficult route is actually an opportunity to learn how USB works and in the next couple of posts (two or three?) we are going to do just that.

We are going to start off with a bit of theory on USB protocols and will gradually work our way to a practical implementation.

I am not going to implement a full USB protocol stack, but just bear minimum that is necessary to catch keystrokes from a USB keyboard.

A note about the source code

I have received a couple of requests to publish the full source code for the project in its current state.

I recently done via the following link on Github: https://github.com/ovalcode/c64fpga

Within the Readme.md I am going some instructions on how to create the project files and building the project.

USB Protocol Overview

When you plug a USB keyboard into your PC your PC is known as the USB host and the keyboard as the USB device. This relationship is also depicted by the block diagram below:


A USB can support one or more functions shown by the blue blocks above. Let us have a quick look at an example where a USB device can support more than one function.

Say, for example, a manufacturer brings out a USB-webcam. The manufacturer might also decide to also ship the device drivers on the web cam itself and surface it to your PC as a Mass Storage Device.

Cool solution, but how will your PC differentiate between these two functionalities on the same set of USB wires?

The answer is to give each functionality an endpoint number. When your PC communicates with the USB device, it always need to provide an endpoint number so that the USB device knows for which function the message is intended for.

Let us now move onto the topic on how USB devices are addressed. USB devices are connected to the PC in a star topology.

Star topology is the same topology used on a Commodore 64 to attach multiple drives and printers to the single serial port on the C64.

A quirk with the star topology is that all devices can see all traffic of each other. To avoid confusion a unique address needs to be assigned to each device.

On C64 disk drives you make use of jumpers on each drive to assign the address.

On USB devices you don't  have jumpers. So, how are the addresses assigned?

The answer is that you just need to reset a device, then it will be in the default address state and respond on requests on Address zero.

At this point the alert reader will say: 'Aha! You just said USB uses a start topology, so won't reset signal reset all the USB devices?'

Yes it is, but a USB reset signal is one of the signals a USB host has finer control of, and you can limit a reset to a specific to a specific USB port.

Let us conclude our Overview of USB by having a look at how communication is orchestrated between the devices and the Host.

USB communication among the devices and the host is orchestrated by means of Host polling.

This means that the host initiates all communication. Even if a USB device have some information that urgently needs the attention of the host it needs to wait for the host to ask it for the information.

EHCI

Back to the ZYBO board. If you have a look in the ZYNQ 7000 manual, you will see that it provide some information on how to establish communication between your Zybo board and a USB device.

However, when working on your USB implementation, you will probably find that the information provided within this Technical Reference manual is simply not enough to give you a clear direction on where to go.

Doing an Internet search on how to implement USB on the Zybo board will probably also not be fruitful either.

I almost went into despair over this, till I found that the USB specific registers on the Zynq is not specific to the Zynq only, but follows a specific standard known as ECHI (Enhanced Host Controller Interface).

The thought that the USB implementation was not Zynq specific, actually widened my horizon and immediately was able to find more implementation examples. In fact, I could find a nice example within the Linux source tree.

To communicate with USB devices, the EHCI standard defines two schedules into which you can queue USB communication requests: Periodic schedule & Asynchronous schedule.

You make use of the Periodic schedule if you want to poll specific USB devices for information at specific time intervals. This will typically be for USB devices like USB sound cards giving a stream of information at a fixed data rate. The following diagram taken from the Zynq technical reference explains how the periodic schedule is implemented:
We will cover in more detail how this schedule is setup at a later stage.

There might be cases where you don't want to poll a USB device constantly for information, but in a more adhoc fashion as the need arises. For this you will make use of asynchronous queues. The following diagram, also takne from the Zynq reference manual, explains how asynchronous queues works:

The interesting part in the diagram is where it mentions Insert and Remove QH's as needed, just reiterating its adhoc nature. When you are at a point of not needing any information from the Async Queue at the moment, you will just have a Queue Head pointing to itself been in one or other Halt state.

When you suddenly need some information again from a particular USB device, you can add a Queue head to the Queue, which will be processed and the Queue will return back to a halt state.

We will cover setup of the Asynchronous queue also at a later stage.

Writing some code

We have covered quite of theory. Let us now see if we can start writing some code.

Programming a USB interface have quite some detail, and one can a bit overwhelmed so that you don't know where to start. But, we can always start with small steps.

Let us start with the following:

  • Getting Caching right
  • Enabling Interrupts
Starting with getting the Caching right. In order to set up a periodic queue or an async queue, you need to write some data structures directly to SDRAM. When an ARM core writes these structures, the data-structures might not end up in SDRAM straight away, but will linger for some time in an L1 or an L2 datacache.

There are a couple of ways to deal with this potential caching issue. I am just going to take a simple approach and disable the Data Cache all together:

#include <stdio.h>
#include "xil_exception.h"
#include "xparameters.h"
#include "platform.h"
#include "xil_printf.h"
#include "xil_cache.h"
#include "xil_io.h"
#include "xscugic.h"
#include "xgpiops.h"
#include <unistd.h>

int main()
{
    Xil_DCacheDisable();
    init_platform();
    cleanup_platform();
    return 0;
}


I have added so long most of the common headers we will need over time for our USB exercise. The headers together with the associated libraries is provided by the Xilinx SDK when you compile your program as a standalone.

Let us us now move onto interrupts. The USB module present within the Zynq provides interrupts for two timer expiry events, and USB events like when transfers is completed. These are very useful interrupts indeed which we would like to intercept.

It is therefore necessary to enable the above mentioned interrupts and ensure one of our custom methods gets called when they happen.

To configure interrupts on the Zynq (and probably most ARM based SoC's) is quite a mission. The complexity is illustrated by the following block diagram:

So, in effect you need to figure how to program the Generic Interrupt controller and then how to enable interrupts on the ARM processor.

Luckily the Xilinx SDK provided some wrappers for shielding most of the complexity for us.

A strip down version for enabling the interrupts will looks as follows:

...
int help;
int myhelp;
XScuGic_Config *IntcConfig;
XScuGic INTCInst;
...
void state_machine();
...
void initint() {

 IntcConfig = XScuGic_LookupConfig(0);
 int status;
 myhelp = 1;

 status = XScuGic_CfgInitialize(&INTCInst, IntcConfig, IntcConfig->CpuBaseAddress);
 Xil_ExceptionRegisterHandler(XIL_EXCEPTION_ID_INT,
         (Xil_ExceptionHandler)XScuGic_InterruptHandler,
      &INTCInst);
 Xil_ExceptionEnable();
 status = XScuGic_Connect(&INTCInst,
         53,
         (Xil_ExceptionHandler)state_machine,
         (void *)myhelp);
 XScuGic_Enable(&INTCInst, 53);

}
...
void state_machine() {
}
...
int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    cleanup_platform();
    return 0;
}


Within the method initint we basically configure the GIC and enable interrupts on the ARM processor.

Also, in the code we are enabling Shared Peripheral Interrupt (SPI) #53. All USB block related interrupts will trigger via this interrupt.

We also configure so that our custom method state_machine will be called each time a  SPI interrupt #53 happens. We will fill the method state_machine during the course of time.

You might realise that the method call XScuGic_Connect accepts a fourth parameter, which in this case we just passed a pointer to an integer called myhelp.

Usually for this parameter you will pass a pointer to a driver structure and when an interrupt happens, your interrupt handler (which in this case is state_machine) will receive this pointer as a parameter.

In our case we will not be using this parameter in our interrupt handler. Instead, we be implement a state machine within our interrupt handler and define a global status variable that will keep track of the current state.

One final thing needs to be done with our interrupt initialisation and this is to enable the applicable USB interrupts.

We would like to enable the General Purpose Timer Interrupt 0 (GP0). We also would like to enable Async Interrupts so that an interrupt is triggered when an Asynchronous transfer has completed.

This would be enabled as follows:

void initint() {
...
 u32 in2 = Xil_In32(0xE0002148) | (1<<24) | (1<<18);
 Xil_Out32(0xE0002148, in2); //enable
}


Scheduling Timers

During our journey to create a USB interface one of the things we will often do is to schedule a timer to wait a certain amount of time before performing the next task.

One can certainly use the sleep or usleep function provided by the SDK wrappers, but I am not so sure how accurate those are.

For the purpose of scheding timers, I am going to make use of General purpose timer 0 provided by the USB block.

This timer works in a very similar fashion as timers you find on a CIA 6526. You load a timer value into a load register, and then force this value to load into a running a timer register. The timer will count down from the predetermined value until it reaches zero and cause an interrupt.

On the Zynq, the timer load value register is located at 0xe0002080. This counter clocks at 1MHz (exactly the same as the CIA on C64). This register is 24 bits wide and can thereforebe set to up to 16 seconds.

To read the current value of the timer you need to read memory location 0xe0002084. The current timer value is present in the lower 24 bits. Bit 31 and bit 30 of this register is also of impotance for us:

  • Bit 31: Timer enable
  • Bit 30:  Timer reset. When setting this bit to a one the timer will be reloaded with the value stored in location 0xe0002080
With this information, it is clear that we should setup the timer using the following steps:

  • Load the required timer value into 0xe0002080
  • Reload the timer by writing 1 to bit 30 of 0xe0002084
  • Start the timer by writing a 1 to bit 31 of 0xe0002084
This translates to the following method:

void scheduleTimer(int usec) {
 //set timer value
 Xil_Out32(0xE0002080, usec);
 //reload timer
 Xil_Out32(0xE0002084, 0x40000000);
 Xil_Out32(0xE0002084, 0x80000000);
}

We can take this method for a test run by making the following changes in code. Let us assume we want to wait 3 seconds:

void state_machine() {
  u32 in2 = Xil_In32(0xE0002144) | (1<<24) | (1<<18);
  Xil_Out32(0xE0002144, in2); //clear

  printf("Timer finished\n");
}

int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    scheduleTimer(3000000);
    usleep(100000000)
    cleanup_platform();
    return 0;
}


I added the usleep in the main method so that our program isn't exited prematurely.

After 3 seconds the message Timer finished will be displayed on the console.

The write to location 0xe0002144 at the beginning of the state machine needs to done to ensure that the interrupt that just happened is cleared. Without this state_machine will be executed in an endless loop.

In Summary

In this post we covered some theory regarding USB.

We also started to write some for disabling data caching and enabling the appropriate USB interrupts.

In the next post we will implement functionality for resetting a USB device and configuration info from it in its default state.

Till next time!