Tuesday, 9 April 2019

Creating the Tape Interface on the Zybo Board


In the previous post we managed to play sound on the Zybo Board.

In this post we will focus again to develop cassette interface been driven by a .TAP file stored in main memory.

Once developed, we will test this interface by playing the output of this interface to speakers.

High Level Overview

Let us start by looking on a high level off what we want to achieve. The following block diagram describe what we want to achieve in a nutshell:

We start off by having a .TAP file stored in the SDRAM of the ZYBO board. The contents of this file gets transferred word by word via the AXI protocol to our FPGA logic.

You might recall from previous posts that whenever we access SDRAM via AXI, we are are making use of two modules that we have developed earlier on.

The first module we use to connect to one of the AXI ports of the ZYNQ processor. In this block we also make use of an AXI Burst block which is an IP provided by Xilinx. The AXI Burst basically abstract the technical details of the AXI protocol and provide us with a set of signals that is easier to work with.

The second module we take the simplified set of signals provided by the AXI Burst Block and store the stream of datawords within a FIFO. The FIFO basically buffer the information received from the AXI port and absorb the bursty nature of the AXI protocol. Thus, on the receiving end the FIFO you will get the datawords at a constant rate.

You will also recall from previous posts that previous mentioned FIFO should have sufficient depth to avoid underflow. In our cassette interface, however, it is sufficient to store only a single word at a time making a FIFO a bit of a overkill.

Why don't we need a FIFO for the cassette interface? That is because we are receiving data words on the AXI bus at 100MHz whereas we will be producing a pulsating data signal with a maximum frequency of about 3KHz. That means that between toggling pulses we will have more than enough time to fetch the next sample from SDRAM.

Let us now refer back to our high level block diagram. Our block used for storing a word of AXI data at a time is the READ WORD block.

We receive data from AXI 32 bits a time, whereas with .TAP file data it is easier for us to inspect data a byte at a time. It is for this reason that we have implemented a BIT SLICER block, that breaks up a word into its individual bytes. This functionality is implemented with a shift register shifting eight bits at a time.

You will see that apart from the data signal between the READ WORD and BIT SLICER block, we have two extra signals: Valid and ACK. This is a pattern you will see quite often in an pipeline architecture.

When the READ WORD block have received  a piece of data fro the AXI port, it informs the Byte Slicer by asserting the Valid line. With assertion of this line the Byte stores the data and asserts the ACK line. This in turns informs the READ WORD block that it can go ahead and retrieve the next word from the AXI port.

In this way both the READ WORD and BYTE SLICER is kept busy. This almost remind us of a assembly line in a factory.

One more thing I want to highlight between the READ WORD and BYTE SLICER is the dotted line with the caption Cross Clock Domain. This is to highlight that on the left side of the dotted line we are working at the AXI clock frequency of 100MHZ. on the right hand side we are working at only 1MHZ. To cater for these different clocks, we will again use milti-flop synchronisers in both blocks.

Next, let us have a look at the Sample Assembler block. If you read through the specification for a .TAP file you will see that each pulse width value will be one byte or four bytes. The rule is simple: If the byte value is zero, the next three bytes will give the absolute pulse width in microseconds. If the byte value is non-zero, the pulse with is contained only within a single byte.

It it thus the purpose of the Sample Assembler to determine the duration of the next pulse width with the stream of incoming bytes.

Our final block is the PWM block, which a lot of you will recognise as the acronym for Pulse Width Modulation. PWM actually describes the data signal you receive from a Commodore Datasette: A set of pulses of varying length.

Our PWM block is basically implemented as a countdown timer, toggling its output each time when a underflow condition has occurred.

You will also see that the output of the PWM block is fed back to the the Sample assembler. The Sample assembler uses the pulse transition to a low as a cue to start assembling the next sample pulse duration.

Implementing the READ WORD block

Let us have a look at the code for the READ WORD block.

I will start by showing the complete block of code for this module and then highlighting important snippets from it:

module read_word(
  input wire clk,
  input wire restart,
  input wire reset,
  output reg [12:0] count_in_buf,
  input ack,
  output wire [31:0] ip2bus_mst_addr,
  output reg [11:0] ip2bus_mst_length,
  input wire [31:0] ip2bus_mstrd_d,
  output wire [31:0] axi_d_out,
  output wire [31:0] data_wire_out,
  output wire [4:0] ip2bus_inputs,
  input wire [5:0] ip2bus_otputs,
  output wire empty,
  input wire read,
  output reset_1_mhz,
  output data_valid

reg master_read_dst_rdy; //change to axi name
wire cmd_ack; // change to axi name
wire mstread_req;
wire mst_type;
reg  [31:0] axi_start_address;
reg  [31:0] data_cap;
reg [31:0] reset_1_counter = 50000000;
wire [11:0] burst_len;
(* ASYNC_REG = "TRUE" *) reg sync_ack_0, sync_ack_1, sync_ack_2;
wire master_read_src_rdy;
reg [12:0] bytes_to_receive;
reg [3:0] state;
reg axi_data_loaded = 0;
reg [12:0] axi_data_inc;
wire neg_clk;
wire pos_edge_ack;

assign data_valid = axi_data_loaded;
assign pos_edge_ack = !sync_ack_2 & sync_ack_1;
assign data_wire_out = {data_cap[7:0], data_cap[15:8], data_cap[23:16], data_cap[31:24]};
assign reset_1_mhz = reset_1_counter > 21000000 ? 1 : 0;

  IDLE = 4'h0,
  INIT_CMD = 4'h1,
  START = 4'h2,
  ACT = 4'h3,

parameter BURST_THRES = 124;  

assign neg_clk = ~clk;

always @(posedge clk)
if (reset_1_counter > 20000000)
  reset_1_counter <= reset_1_counter - 1;

always @(posedge clk)
  sync_ack_0 <= ack;
  sync_ack_1 <= sync_ack_0;
  sync_ack_2 <= sync_ack_1;

always @(posedge clk)
 if (restart | pos_edge_ack | reset)
  axi_data_loaded <= 0; 
 else if ((state > START) & !master_read_src_rdy & !axi_data_loaded) 
   axi_data_loaded <= 1;
always @(posedge clk)
  if (!master_read_src_rdy & !axi_data_loaded)
    data_cap <= ip2bus_mstrd_d;

always @(posedge clk)
if ((reset | restart) & !axi_data_loaded & state == 0)  
  state <= 0;
  case( state )
    IDLE: if (!axi_data_loaded) 
            state <= INIT_CMD;
    INIT_CMD: state <= START;             
    START: if (cmd_ack)
             state <= ACT;
    ACT: if (!master_read_src_rdy)
             state <= TRANSMITTING;
    TRANSMITTING: state <= IDLE;    
always @(negedge clk)
if (restart | reset)
  axi_start_address <= 32'h200000;
  axi_data_inc <= 0;
else if (state == INIT_CMD)
  axi_start_address <= axi_start_address + axi_data_inc;
  axi_data_inc <= 4;

always @(negedge clk)
if (state == INIT_CMD)
  ip2bus_mst_length <= 4; 
assign mstread_req = (state == START) ? 1 : 0;

assign mst_type = (state == START) ? 1 : 0;

always @*
  if (state == START)
    master_read_dst_rdy = 0;
  else if (state > START & !axi_data_loaded)
    master_read_dst_rdy = 0;
   master_read_dst_rdy = 1;
assign master_read_src_rdy = ip2bus_otputs[3];
assign cmd_ack = ip2bus_otputs[0];
assign ip2bus_inputs[0] = mstread_req;
assign ip2bus_inputs[1] = mst_type; 
assign ip2bus_mst_addr = axi_start_address;
assign ip2bus_inputs[2] = master_read_dst_rdy;

assign ip2bus_inputs[3] = 1'b0;
assign ip2bus_inputs[4] = 1'b0;

Firstly this code contains some glue logic for interfacing with the AXI Burst block. There is also some reset logic and restart logic. Restart logic is important if you reload SDRAM with a new .TAP file.

We will be receiving the ACK signal from the Byte Slicer, which is another clock domain. For this reason we are defining three synchroniser flip-flops sync_ack_0, sync_ack_1 and sync_ack_2. We are also using these flip-flops to determine the positive edge of the ACK signal, which we use to trigger the loading of the next word from the AXI port.

Bit Slicer

Let us have a look at the implementation for the bit slicer:

module byteslicer(
  input clk,
  input data_valid,
  output [7:0] byte_out,
  output ack,
  input [31:0] data_in,
  input restart,
  input read
parameter STATE_INIT = 0;
parameter STATE_LOADED = 1;
parameter STATE_SHIFT_1 = 2;
parameter STATE_SHIFT_2 = 3;
parameter STATE_SHIFT_3 = 4;
reg [3:0] state = 0;
reg [31:0] data_reg;
(* ASYNC_REG = "TRUE" *) reg data_valid_0, data_valid_1;

assign ack = state == STATE_INIT & data_valid_1;
assign byte_out = data_reg[31:0];

always @(posedge clk)
  data_valid_0 <= data_valid;
  data_valid_1 <= data_valid_0;

always @(posedge clk)
if (state == STATE_INIT & data_valid_1)
  data_reg <= data_in;
else if ((state == STATE_LOADED | state == STATE_SHIFT_1 | STATE_SHIFT_2 | STATE_SHIFT_3) & read)
  data_reg <= {data_reg[23:0],8'h0};

always @(posedge clk)
if (restart)
  state <= STATE_INIT;
else case (state)
  STATE_INIT: state <= data_valid_1 ? STATE_LOADED : STATE_INIT;
  STATE_SHIFT_1: state <= read ? STATE_SHIFT_2 : STATE_SHIFT_1;
  STATE_SHIFT_2: state <= read ? STATE_SHIFT_3 : STATE_SHIFT_2;
  STATE_SHIFT_3: state <= read ? STATE_INIT : STATE_SHIFT_3;


In this module we have again the scenario where we receive a signal from another clock domain, which in this case is data_valid. For this reason we are creating creating the synchronisers data_valid_0 and data_valid_1.

As mentioned earlier on, the bit slicer is shift register shifting eight bits at a time. In our implementation, the shift happens while the read line is asserted, which will be driven by the sample assembler when it needs more bytes.

Sample Assembler

The implementation for the sampler assembler is as follows:

module sample_assembler(
  input clk,
  input data_valid,
  input [7:0] data,
  output ack,
  input pwm,
  output reg [23:0] timer_val,
  output tape_out,
  input restart
parameter STATE_START = 0;
parameter STATE_LOADED = 1;
parameter STATE_LOADED_1 = 2;
parameter STATE_LOADED_2 = 3;
parameter STATE_LOADED_3 = 4;

reg [3:0] state = 0;
reg pwm_0, pwm_1;
reg three_byte_sample = 0;
wire neg_edge;

assign tape_out = pwm;
assign neg_edge = !pwm_0 & pwm_1;

always @(posedge clk)
  pwm_0 <= pwm;
  pwm_1 <= pwm_0;

assign ack = state == STATE_START | (state == STATE_LOADED_1 & data_valid) | (state == STATE_LOADED_2 & data_valid) | (state == STATE_LOADED_3 & data_valid);

always @(posedge clk)
if (state == STATE_START & data_valid & data != 0)
  timer_val <= {data, 3'b0};
else if ((state == STATE_LOADED_1 | state == STATE_LOADED_2 | state == STATE_LOADED_3) & data_valid)
  timer_val <= {data, timer_val[23:8]};

always @(posedge clk)
if (restart)
  state <= STATE_START;
else case(state)
  STATE_START: begin
                 three_byte_sample <= 0;
                 if (data_valid & data != 0) 
                   state <= STATE_LOADED;
                 else if (data_valid)
                   state <= STATE_LOADED_1;
  STATE_LOADED_1: if (data_valid)
                    //three_byte_sample <= 1;             
                    state <= STATE_LOADED_2;
    //state <= data_valid ? STATE_LOADED : STATE_START;
  STATE_LOADED_2: if (data_valid)
                   state <= STATE_LOADED_3;
  STATE_LOADED_3: if (data_valid)
                   state <= STATE_LOADED;    

    state <= neg_edge ? STATE_START : STATE_LOADED;
    three_byte_sample <= 0;

The Sample Assembler starts off by inspecting the first byte that comes in. If it is non-zero, it gets padded with three zeros and we have a sample value.

If the first byte is a zero, the Sample Assembler waits patiently for the next three bytes to be clocked in to get the full sample value.

Once a sample value is created, we wait for a negative clock transition from the PWM to restart the process.


Let us have a look at our final module. Here is the implementation:

module tape_pwm(
  input [23:0] time_val,
  input load_timer,
  output pwm,
  input clk
  reg polarity = 1;
  reg [23:0] load = 100;
  reg [23:0] timer = 100;
  assign pwm = polarity;
  always @(posedge clk)
    if (load_timer)
      load <= {1'b0,time_val[23:1]};
  always @(posedge clk)
  if (timer > 0)
    timer <= timer - 1;
    timer <= load;
  always @(posedge clk)
    if (timer == 0)
      polarity <= ~polarity;

As mentioned this is just a countdown timer of which toggles the output on underflow.

You will also realise that when storing the timer val in the load register we discard the lower bit. This is because the timer values in a .TAP file is the period between two positive transitions. So, we need to toggle the pulse at a period of half this value.

Linking everything up

With all the modules created it is a matter of linking everything up.

A port that need special mention is the restart port you get on most of these modules. This port needs to be assigned to a AXI slave port so the ZYNQ processor can access it.  You can then toggle this bit programatically when you have loaded a .TAP file into SDRAM.

A .TAP file can be loaded into SDRAM by making use of the XSCT command mwr (memory write).

With all the modules linked up you can test the design by integrating with the sound system we developed in the previous post. The produced sound should sound similar as when you play a C64 Tape on a Tape deck.

In Summary

In this post we have developed the cassette interface that will take a .TAP file and produce a corresponding signal of variable pulse widths.

In the next post we will start to integrate this cassette interface to our C64 module.

Till next time!