Tuesday, 9 April 2019

Creating the Tape Interface on the Zybo Board

Foreword

In the previous post we managed to play sound on the Zybo Board.

In this post we will focus again to develop cassette interface been driven by a .TAP file stored in main memory.

Once developed, we will test this interface by playing the output of this interface to speakers.

High Level Overview

Let us start by looking on a high level off what we want to achieve. The following block diagram describe what we want to achieve in a nutshell:


We start off by having a .TAP file stored in the SDRAM of the ZYBO board. The contents of this file gets transferred word by word via the AXI protocol to our FPGA logic.

You might recall from previous posts that whenever we access SDRAM via AXI, we are are making use of two modules that we have developed earlier on.

The first module we use to connect to one of the AXI ports of the ZYNQ processor. In this block we also make use of an AXI Burst block which is an IP provided by Xilinx. The AXI Burst basically abstract the technical details of the AXI protocol and provide us with a set of signals that is easier to work with.

The second module we take the simplified set of signals provided by the AXI Burst Block and store the stream of datawords within a FIFO. The FIFO basically buffer the information received from the AXI port and absorb the bursty nature of the AXI protocol. Thus, on the receiving end the FIFO you will get the datawords at a constant rate.

You will also recall from previous posts that previous mentioned FIFO should have sufficient depth to avoid underflow. In our cassette interface, however, it is sufficient to store only a single word at a time making a FIFO a bit of a overkill.

Why don't we need a FIFO for the cassette interface? That is because we are receiving data words on the AXI bus at 100MHz whereas we will be producing a pulsating data signal with a maximum frequency of about 3KHz. That means that between toggling pulses we will have more than enough time to fetch the next sample from SDRAM.

Let us now refer back to our high level block diagram. Our block used for storing a word of AXI data at a time is the READ WORD block.

We receive data from AXI 32 bits a time, whereas with .TAP file data it is easier for us to inspect data a byte at a time. It is for this reason that we have implemented a BIT SLICER block, that breaks up a word into its individual bytes. This functionality is implemented with a shift register shifting eight bits at a time.

You will see that apart from the data signal between the READ WORD and BIT SLICER block, we have two extra signals: Valid and ACK. This is a pattern you will see quite often in an pipeline architecture.

When the READ WORD block have received  a piece of data fro the AXI port, it informs the Byte Slicer by asserting the Valid line. With assertion of this line the Byte stores the data and asserts the ACK line. This in turns informs the READ WORD block that it can go ahead and retrieve the next word from the AXI port.

In this way both the READ WORD and BYTE SLICER is kept busy. This almost remind us of a assembly line in a factory.

One more thing I want to highlight between the READ WORD and BYTE SLICER is the dotted line with the caption Cross Clock Domain. This is to highlight that on the left side of the dotted line we are working at the AXI clock frequency of 100MHZ. on the right hand side we are working at only 1MHZ. To cater for these different clocks, we will again use milti-flop synchronisers in both blocks.

Next, let us have a look at the Sample Assembler block. If you read through the specification for a .TAP file you will see that each pulse width value will be one byte or four bytes. The rule is simple: If the byte value is zero, the next three bytes will give the absolute pulse width in microseconds. If the byte value is non-zero, the pulse with is contained only within a single byte.

It it thus the purpose of the Sample Assembler to determine the duration of the next pulse width with the stream of incoming bytes.

Our final block is the PWM block, which a lot of you will recognise as the acronym for Pulse Width Modulation. PWM actually describes the data signal you receive from a Commodore Datasette: A set of pulses of varying length.

Our PWM block is basically implemented as a countdown timer, toggling its output each time when a underflow condition has occurred.

You will also see that the output of the PWM block is fed back to the the Sample assembler. The Sample assembler uses the pulse transition to a low as a cue to start assembling the next sample pulse duration.

Implementing the READ WORD block

Let us have a look at the code for the READ WORD block.

I will start by showing the complete block of code for this module and then highlighting important snippets from it:

module read_word(
  input wire clk,
  input wire restart,
  input wire reset,
  output reg [12:0] count_in_buf,
  input ack,
  output wire [31:0] ip2bus_mst_addr,
  output reg [11:0] ip2bus_mst_length,
  input wire [31:0] ip2bus_mstrd_d,
  output wire [31:0] axi_d_out,
  output wire [31:0] data_wire_out,
  output wire [4:0] ip2bus_inputs,
  input wire [5:0] ip2bus_otputs,
  output wire empty,
  input wire read,
  output reset_1_mhz,
  output data_valid
    );

reg master_read_dst_rdy; //change to axi name
wire cmd_ack; // change to axi name
wire mstread_req;
wire mst_type;
reg  [31:0] axi_start_address;
reg  [31:0] data_cap;
reg [31:0] reset_1_counter = 50000000;
wire [11:0] burst_len;
(* ASYNC_REG = "TRUE" *) reg sync_ack_0, sync_ack_1, sync_ack_2;
wire master_read_src_rdy;
reg [12:0] bytes_to_receive;
reg [3:0] state;
reg axi_data_loaded = 0;
reg [12:0] axi_data_inc;
wire neg_clk;
wire pos_edge_ack;

assign data_valid = axi_data_loaded;
assign pos_edge_ack = !sync_ack_2 & sync_ack_1;
assign data_wire_out = {data_cap[7:0], data_cap[15:8], data_cap[23:16], data_cap[31:24]};
assign reset_1_mhz = reset_1_counter > 21000000 ? 1 : 0;

parameter
  IDLE = 4'h0,
  INIT_CMD = 4'h1,
  START = 4'h2,
  ACT = 4'h3,
  TRANSMITTING = 4'h4;

parameter BURST_THRES = 124;  

assign neg_clk = ~clk;

always @(posedge clk)
if (reset_1_counter > 20000000)
  reset_1_counter <= reset_1_counter - 1;

always @(posedge clk)
begin
  sync_ack_0 <= ack;
  sync_ack_1 <= sync_ack_0;
  sync_ack_2 <= sync_ack_1;
end

always @(posedge clk)
 if (restart | pos_edge_ack | reset)
  axi_data_loaded <= 0; 
 else if ((state > START) & !master_read_src_rdy & !axi_data_loaded) 
   axi_data_loaded <= 1;
    
always @(posedge clk)
  if (!master_read_src_rdy & !axi_data_loaded)
    data_cap <= ip2bus_mstrd_d;

always @(posedge clk)
if ((reset | restart) & !axi_data_loaded & state == 0)  
  state <= 0;
else
  case( state )
    IDLE: if (!axi_data_loaded) 
            state <= INIT_CMD;
    INIT_CMD: state <= START;             
    START: if (cmd_ack)
             state <= ACT;
    ACT: if (!master_read_src_rdy)
             state <= TRANSMITTING;
    TRANSMITTING: state <= IDLE;    
  
  endcase  
  
always @(negedge clk)
if (restart | reset)
begin
  axi_start_address <= 32'h200000;
  axi_data_inc <= 0;
end
else if (state == INIT_CMD)
begin
  axi_start_address <= axi_start_address + axi_data_inc;
  axi_data_inc <= 4;
end    

always @(negedge clk)
if (state == INIT_CMD)
  ip2bus_mst_length <= 4; 
  
assign mstread_req = (state == START) ? 1 : 0;

assign mst_type = (state == START) ? 1 : 0;

always @*
  if (state == START)
    master_read_dst_rdy = 0;
  else if (state > START & !axi_data_loaded)
    master_read_dst_rdy = 0;
  else
   master_read_dst_rdy = 1;
         
assign master_read_src_rdy = ip2bus_otputs[3];
assign cmd_ack = ip2bus_otputs[0];
assign ip2bus_inputs[0] = mstread_req;
assign ip2bus_inputs[1] = mst_type; 
assign ip2bus_mst_addr = axi_start_address;
assign ip2bus_inputs[2] = master_read_dst_rdy;

assign ip2bus_inputs[3] = 1'b0;
assign ip2bus_inputs[4] = 1'b0;
endmodule



Firstly this code contains some glue logic for interfacing with the AXI Burst block. There is also some reset logic and restart logic. Restart logic is important if you reload SDRAM with a new .TAP file.

We will be receiving the ACK signal from the Byte Slicer, which is another clock domain. For this reason we are defining three synchroniser flip-flops sync_ack_0, sync_ack_1 and sync_ack_2. We are also using these flip-flops to determine the positive edge of the ACK signal, which we use to trigger the loading of the next word from the AXI port.

Bit Slicer

Let us have a look at the implementation for the bit slicer:

module byteslicer(
  input clk,
  input data_valid,
  output [7:0] byte_out,
  output ack,
  input [31:0] data_in,
  input restart,
  input read
    );
    
parameter STATE_INIT = 0;
parameter STATE_LOADED = 1;
parameter STATE_SHIFT_1 = 2;
parameter STATE_SHIFT_2 = 3;
parameter STATE_SHIFT_3 = 4;
    
reg [3:0] state = 0;
reg [31:0] data_reg;
(* ASYNC_REG = "TRUE" *) reg data_valid_0, data_valid_1;

assign ack = state == STATE_INIT & data_valid_1;
assign byte_out = data_reg[31:0];

always @(posedge clk)
begin
  data_valid_0 <= data_valid;
  data_valid_1 <= data_valid_0;
end

always @(posedge clk)
if (state == STATE_INIT & data_valid_1)
  data_reg <= data_in;
else if ((state == STATE_LOADED | state == STATE_SHIFT_1 | STATE_SHIFT_2 | STATE_SHIFT_3) & read)
  data_reg <= {data_reg[23:0],8'h0};

always @(posedge clk)
if (restart)
  state <= STATE_INIT;
else case (state)
  STATE_INIT: state <= data_valid_1 ? STATE_LOADED : STATE_INIT;
  STATE_LOADED: state <= read ? STATE_SHIFT_1 : STATE_LOADED;
  STATE_SHIFT_1: state <= read ? STATE_SHIFT_2 : STATE_SHIFT_1;
  STATE_SHIFT_2: state <= read ? STATE_SHIFT_3 : STATE_SHIFT_2;
  STATE_SHIFT_3: state <= read ? STATE_INIT : STATE_SHIFT_3;
endcase

endmodule


In this module we have again the scenario where we receive a signal from another clock domain, which in this case is data_valid. For this reason we are creating creating the synchronisers data_valid_0 and data_valid_1.

As mentioned earlier on, the bit slicer is shift register shifting eight bits at a time. In our implementation, the shift happens while the read line is asserted, which will be driven by the sample assembler when it needs more bytes.

Sample Assembler

The implementation for the sampler assembler is as follows:

module sample_assembler(
  input clk,
  input data_valid,
  input [7:0] data,
  output ack,
  input pwm,
  output reg [23:0] timer_val,
  output tape_out,
  input restart
    );
    
parameter STATE_START = 0;
parameter STATE_LOADED = 1;
parameter STATE_LOADED_1 = 2;
parameter STATE_LOADED_2 = 3;
parameter STATE_LOADED_3 = 4;


reg [3:0] state = 0;
reg pwm_0, pwm_1;
reg three_byte_sample = 0;
wire neg_edge;

assign tape_out = pwm;
assign neg_edge = !pwm_0 & pwm_1;

always @(posedge clk)
begin
  pwm_0 <= pwm;
  pwm_1 <= pwm_0;
end

assign ack = state == STATE_START | (state == STATE_LOADED_1 & data_valid) | (state == STATE_LOADED_2 & data_valid) | (state == STATE_LOADED_3 & data_valid);

always @(posedge clk)
if (state == STATE_START & data_valid & data != 0)
  timer_val <= {data, 3'b0};
else if ((state == STATE_LOADED_1 | state == STATE_LOADED_2 | state == STATE_LOADED_3) & data_valid)
  timer_val <= {data, timer_val[23:8]};

always @(posedge clk)
if (restart)
  state <= STATE_START;
else case(state)
  STATE_START: begin
                 three_byte_sample <= 0;
                 if (data_valid & data != 0) 
                   state <= STATE_LOADED;
                 else if (data_valid)
                   state <= STATE_LOADED_1;
               end
  STATE_LOADED_1: if (data_valid)
                  begin
                    //three_byte_sample <= 1;             
                    state <= STATE_LOADED_2;
                  end     
    //state <= data_valid ? STATE_LOADED : STATE_START;
  STATE_LOADED_2: if (data_valid)
                   state <= STATE_LOADED_3;
                       
  STATE_LOADED_3: if (data_valid)
                   state <= STATE_LOADED;    

  
  STATE_LOADED: begin
    state <= neg_edge ? STATE_START : STATE_LOADED;
    three_byte_sample <= 0;
  end
endcase
endmodule


The Sample Assembler starts off by inspecting the first byte that comes in. If it is non-zero, it gets padded with three zeros and we have a sample value.

If the first byte is a zero, the Sample Assembler waits patiently for the next three bytes to be clocked in to get the full sample value.

Once a sample value is created, we wait for a negative clock transition from the PWM to restart the process.

PWM

Let us have a look at our final module. Here is the implementation:

module tape_pwm(
  input [23:0] time_val,
  input load_timer,
  output pwm,
  input clk
    );
    
  reg polarity = 1;
  reg [23:0] load = 100;
  reg [23:0] timer = 100;
  assign pwm = polarity;
    
  always @(posedge clk)
    if (load_timer)
      load <= {1'b0,time_val[23:1]};
  
  always @(posedge clk)
  if (timer > 0)
    timer <= timer - 1;
  else
    timer <= load;
    
  always @(posedge clk)
    if (timer == 0)
      polarity <= ~polarity;
endmodule


As mentioned this is just a countdown timer of which toggles the output on underflow.

You will also realise that when storing the timer val in the load register we discard the lower bit. This is because the timer values in a .TAP file is the period between two positive transitions. So, we need to toggle the pulse at a period of half this value.

Linking everything up

With all the modules created it is a matter of linking everything up.

A port that need special mention is the restart port you get on most of these modules. This port needs to be assigned to a AXI slave port so the ZYNQ processor can access it.  You can then toggle this bit programatically when you have loaded a .TAP file into SDRAM.

A .TAP file can be loaded into SDRAM by making use of the XSCT command mwr (memory write).

With all the modules linked up you can test the design by integrating with the sound system we developed in the previous post. The produced sound should sound similar as when you play a C64 Tape on a Tape deck.

In Summary

In this post we have developed the cassette interface that will take a .TAP file and produce a corresponding signal of variable pulse widths.

In the next post we will start to integrate this cassette interface to our C64 module.

Till next time!



Monday, 18 February 2019

Creating Sound on the Zybo board

Foreword

In the previous post we start going down the alley of Tape Emulation and ended off writing some Python code for converting a .TAP file to sound.

The sound that we generate is basically a set of pulses of varying widths. Outputting this pulse widths as sound is a quick sanity check if we implemented tape emulation more or less correctly.

Our next goal is to see if we can implement this sound generation from a .TAP file in real time within the Zybo boards FPGA.

Playing the generated sound on the Zybo is perhaps the most complex part of the exercise, so I have decided to dedicate this post to Zybo sound generation.

Sound on the Zybo board

One of the nice features of the Zybo board is that it supports onboard sound. The Zybo board simply cannot hide away this feature because of the familiar color coded Line In/Out/Mic ports:


These ports are all hooked up to a Audio Codec chip from Analogue devices: The SSM2603.

This Audio Codec have two ports that hooks up to the ZYNQ SoC: an I2C port and an I2S port. The I2C port is used to configure the Audio Codec like sample rate and volume control.

The I2S port is used to transmit digital audio data between the ZYNQ and Audio Codec.

Both the I2S port and I2C port is linked to pins on the ZYNQ of which only the FPGA has access to.

Configuring the Audio Codec

As mentioned in the previous section, configuation of the audio codec is done via a I2C port.

Implementing a I2C port in an FPGA can be a daunting task, and one will be pleased to learn that the Zynq have two I2C onchip peripherials.

Shortly after discovering this, one might feel someone burst your bubble by discovering that the I2C port of the Audio codec is hooked up to pins that onchip peripherals don't have direct access to.

But fear not! The Zynq allows you to configure the ports of onchip peripherals to be redirected via EMIO. This basically means that you can make these ports available to the FPGA. Within the FPGA you can then either decide to hook up these ports directly to the output pins or you can join hook the ports to custom logic blocks.

The following block diagram within Vivado shows how this is achieved:



I have marked in red the ports of an onchip peripheral that I have exposed to the FPGA.

These ports I have hooked up to two instances of a custom logic block iobuf. This logic block is basically an implementation of a tristate buffer.

We can now proceed and write some code to initialise the Audio Codec. Firstly we need to initialise the onchip i2c peripheral we are going to use:

int main()
{
...
    Xil_Out32(0xe000501c, 0x1f);
//Set divider + addressing mode
    Xil_Out32(0xE0005000, 0x9004);
//master -> ACK -> CLR FIFO -> hold bus
    Xil_Out32(0xE0005000, 0x9004 + 2 + 8);
...

}

I have added a bit of comments on what is going on during initialisation, but I am not going to go into too much detail here. More details is provided in the Zynq Technical reference manual in Chapter 20: I2C Controller as well as Appendix B, in the register details for the I2C controller.

Let us now write some methods to read and write to the registers of Audio Codec:

...
int readReg(int addr) {
 //master -> ACK -> CLR FIFO -> hold bus
     u32 in2 = Xil_In32(0xE0005000) | 64 | 16;
     in2 = in2 & ~1;
     Xil_Out32(0xE0005000, in2);
     //write data to register
         Xil_Out32(0xE000500c, addr << 1);
     //write address
         Xil_Out32(0xE0005008, 26);
     // Wait for completion
         u32 status = Xil_In32(0xe0005010) & 1;
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);

         //clear interrupts
         Xil_Out32(0xe0005010, 1);

         //set hold bus -> read -> clear fifo
         in2 = Xil_In32(0xe0005000) | 16 | 1 | 64;
         Xil_Out32(0xe0005000, in2);
         //set transfer size
         Xil_Out32(0xe0005014, 2);
         //set address
         Xil_Out32(0xe0005008, 26);
         //clear hold
         in2 = Xil_In32(0xe0005000) & (~16);
         Xil_Out32(0xe0005000, in2);
         //wait for completion
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);
         Xil_Out32(0xe0005010, 1);
         u32 byte0 = Xil_In32(0xe000500c);
         u32 byte1 = Xil_In32(0xe000500c);
         return byte0 | (byte1 << 8);

}
...
void writeReg(int addr, int data) {
 //master -> ACK -> CLR FIFO -> hold bus
     u32 in2 = Xil_In32(0xE0005000) | 64 | 16;
     in2 = in2 & ~1;
     Xil_Out32(0xE0005000, in2);
     //write data to register
         Xil_Out32(0xE000500c, (addr << 1) | ((data & 256) ? 1 : 0));
         Xil_Out32(0xE000500c, data & 255);
     //write address
         Xil_Out32(0xE0005008, 26);
     // Wait for completion
         u32 status = Xil_In32(0xe0005010) & 1;
         do {
          status = Xil_In32(0xe0005010) & 1;
         } while (!status);

         //clear interrupts
         Xil_Out32(0xe0005010, 1);

         in2 = Xil_In32(0xe0005000) & (~16);
         Xil_Out32(0xe0005000, in2);
       return;
}
...

Again, here is lot of things going on here and can be best understood with the Zynq Technical Reference Manual. Here it is also handy to have the Datasheet for the SSM2603 Audio Codec available to understand the format required for setting and reading registers.

We can now continue and write some code for initialising the Audio Codec:

int main()
{
...
    writeReg(15,0);
    usleep(1000);
    writeReg(6, 16 + 32 + 64);
    writeReg(2, 0b101111001);
    writeReg(3, 0b101111001);
    writeReg(4, 0);
    writeReg(5, 0);
    writeReg(7, 1);
    writeReg(8, 0);
    usleep(1000);
    writeReg(9, 1);
    usleep(1000);
    writeReg(6, 32);
    usleep(1000);
    writeReg(4,16+6);

...
}

Let me give a quick run down what is happening here.

The first write to register 15 forces the Audio Codec to write default values to all registers.

The write to register 6 powers up all blocks within the Audio Codec accept the Out Block. According to the datasheet we can only enable the out block later in the initialisation process.

The writes to registers 2 and 3 sets the volume of the left and right DAC.

Next, let us skip straight to the write to register 7. This write informs the format of the samples that will be presented to the I2S bus, which in this case is 16 bit samples that is left justified.

With the write to register 8 we are setting the actual sample rate, which is 48KHz.

With the write to register 9 we are enabling the digital core. Note that it is preceeded by a small delay. According to the datasheet a short delay should be allowed after all blocks are powered up.

With the write to register 6 we are finally powering up the Out block and with the write to register 4 we are enabling the DAC.

You will also see that between the write to register 6 and the write to register 4 I have also added as small delay. Nowhere in the datasheet it is specified that it is necessary to do this. However, with trail and error i have found that if you do not add this delay you can do whatever you want, you will not get any sound output to the speaker.

This concludes the configuration of the Audio Codec. In the next section we will discuss how to implement the I2S interface.

Implementing the I2S interface

To implement a I2S interface is much simpler than a I2C interface.

To start off let us have a look at a I2S timing diagram from the Audio Codec datasheet:

Within the datasheet you will see other timing diagrams for other Input modes, but we will only be focusing on Left-Justified mode.

A signal not present in the above diagram is MCLK (e.g. Master Clock) which is 256 times the sampling rate.

Back to the diagram. The first waveform (RECLRC/PBLRC), indicates for which channel the current sample is applicable for.

The BCLK generates a pulse for each bit of data. In our case where we have 16 bits per channel, the frequency will be 32 times the sample rate.

Lastly we have the signal RECDAT/PBDAT that is the actual sample data.

All three signals together with MCLK should all be in sync to avoid data corruption. We will see in a moment how this is done.

Now let us calculate the frequencies for the different clocks.

As mentioned earlier on MCLK is 256 times the sample rate. Thus MLCK should be 12.288MHz.

BCLK is 32 times the sample rate and therefore is the frequency 1.536MHz.

We will generate the 12.288MHz clock with a clock wizard within Vivado. The resulting clock we will need to forward externally from the Zynq to the Audio Codec. Xilinx recommends not to forward a generated clock directly to an output pin, but rather to make use of an ODDR component. The following module definition will take care of this:

module oddr_buf(
  output Mlck_O,
  input clk_in
    );

   ODDR #(
      .DDR_CLK_EDGE("OPPOSITE_EDGE"), // "OPPOSITE_EDGE" or "SAME_EDGE" 
      .INIT(1'b0),    // Initial value of Q: 1'b0 or 1'b1
      .SRTYPE("SYNC") // Set/Reset type: "SYNC" or "ASYNC" 
   ) ODDR_inst (
      .Q(Mlck_O),   // 1-bit DDR output
      .C(clk_in),   // 1-bit clock input
      .CE(1), // 1-bit clock enable input
      .D1(1), // 1-bit data input (positive edge)
      .D2(0), // 1-bit data input (negative edge)
      .R(0),   // 1-bit reset
      .S(0)    // 1-bit set
   );

endmodule


We pass the generated clock to clk_in. The output port Mlck_O is the signal we should assign to an output pin.

Now onto the generation of the rest of the I2S signals. We start by creating an empty module with the required ports:

module i2s(
  input clk,
  output clk_1_5_mhz,
  output channel_enable,
  output out_data,
    );

endmodule

For the input port clk we pass the generated 12.288MHz signal. clk_1_5_mhz is our generated bclk signal.

channel_enable is the channel indicator and out_data the actual sample data.

Let us write some code to generate the bclk signal:

...
reg [1:0] clk_div_counter = 0;
reg bclk_int = 0;
...
    always @(posedge clk)
    if (clk_div_counter == 3)
      bclk_int <= ~bclk_int;

    always @(posedge clk)
        clk_div_counter <= clk_div_counter + 1; 
...

So, the bclk clock is generated from the MCLK by means of a clock divider.

Both the remaining signals transition on the negative edge of BCLK, so us quickly create a wire signalling this behaviour:

...
    wire neg_edge;
...
    assign neg_edge = (clk_div_counter == 3) & (bclk_int == 1) ? 1 : 0;
...

Next, let us write code for the channel indicator:

...
    assign channel_enable = prclk_int;
...
    reg [3:0] channel_enable_counter = 0;
...
    always @(posedge clk)
    if (neg_edge)
      channel_enable_counter <= channel_enable_counter + 1;

    always @(posedge clk)
    if (neg_edge & channel_enable_counter == 15)
      prclk_int <= ~prclk_int;
...

And now let us write some code for out data:

...
    reg [31:0] shift_reg;
...
    assign out_data = shift_reg[31];
...
    always @(posedge clk)
    if (channel_enable_counter == 15 & neg_edge)
    begin
      shift_reg <= {data_val, data_val};
    end
    else if (neg_edge)
      shift_reg <= {shift_reg[30:0] , 1'b0};
...

As you can see, we have implemented a shift register for shifting out the sample values, which we reload each time the channel indicator signal toggles.

data_val is the actual sample value, which we haven't defined yet. For this we are going to define something very simple, which will be a monotone with a frequency between 2000Hz and 3000Hz. For this we can just alternate the sample value between 30000 and 0 every 6th sample:

...
    reg [15:0] data_val = 0;
...
    always @(posedge clk)
    if (channel_enable_counter == 15 & neg_edge)
    begin
      if (sample_mod_counter == 0)
      begin
        sample_mod_counter <= 6;
        data_val <= (data_val == 0) ? 30000 : 0;        
      end
      else
        sample_mod_counter <= sample_mod_counter - 1;
    end
...

What only remains is to link up the external pins to our audio codec:


This is all there is for creating sound on the Zybo board, which in this case will be a monotone

In Summary

In this post we played around with sound on the Zybo board and managed to generate a monotone.

This exercise will aid us in the next post to create a cassette interface and verify the design by listening to the produced pulses.

This post will also come in handy in future posts where we implement SID emulation.

Till next time!

Thursday, 24 January 2019

Focusing on Tape Integration

Foreword

In the previous post we managed to interface our C64 FPGA module with a USB keyboard.

In this post we will start to focus on tape integration to our C64 module. Well, not exactly interfacing with a 1530 Datassette, but simulating the tape loading process from a .TAP file.

While pondering in this alley, we might just relive the nostalgia a couple of decades ago where we all played a C64 cassette on a normal sound system to hear what it sounds like. For this exercise we will see if we can take a .TAP file and see if we can reproduce similar sounds, with the help of Python on a PC.

Once we have successfully reproduced the sound of a C64 tape, we will set forth and see if we can do the same on the Zybo board, with the logic implemented within the FPGA.

I will not be covering all the above mentioned in this post, but rather in several ones, working incrementally towards a solution where we have a fully integrated tape to C64 module solution.

The .TAP file format

Let us start by looking at the .TAP file format. For this exercise let us have a look at a snippet of a .TAP file:


The file header starts with a textual description C64-TAPE-RAW. The actual file data starts at offset 0x14.

The file data basically a set of pulse widths. In general a pulse width is represented by one byte. Multiply this value by 8, and you have the pulse width in terms of 1MHz pulses.

Let us have a look at our example snippet. Starting at offset 0x14, we see a series of 30's. Converting this number to decimal and multiplying by 8, we get 384. This gives us a period of 0.000384s.

From this period we can calculate the frequency from the equation f = 1/T. This gives us a frequency of 2604Hz. This is the monotone you hear for the first 10 seconds or so from a C64 tape.

Converting a TAP file to sound

With the information from the previous section, let us see if we can take a .TAP file and and generate the sound as we remember it a couple of decades ago.

For this exercise we will be using Python to generate the raw samples. Not many programs can play raw samples, but Audacity can play it.

Within Python we start off by opening the TAP file and moving to the byte position where the actual data starts:

import struct
f = open('Dan Dare.tap', 'rb')
resfile = open('file.dat', 'wb')
timei = 0
f.seek(20)

timei is the current time in millionths of a second. I will show in a moment how this variable gets updated.

The whole sound sample generation is driven by the following loop:

...
while timei < 240000000:
...

This loop will generate 4 minutes worth of sound samples.

Within the loop we start off by reading a pulse width:

...
while timei < 240000000:
  timeval = ord(struct.unpack('c', f.read(1))[0])
...

One thing I didn't mention earlier on is that a pulse byte value of zero is a special exception. A pulse byte value of zero means that an absolute time period value is to follow in the next three bytes. With this information in mind, we add the following code to our loop:

while timei < 240000000:
  timeval = ord(struct.unpack('c', f.read(1))[0])
  if timeval == 0:
    byte1 = ord(struct.unpack('c', f.read(1))[0])
    byte2 = ord(struct.unpack('c', f.read(1))[0])
    byte3 = ord(struct.unpack('c', f.read(1))[0])
    timeval = (byte3 << 16) + (byte2 << 8) + byte1
  else:
    timeval = timeval << 3


So, in this part we cater for both the zero byte time values and for other case.

We now have a physical time value, and hence we can update timei:

...
while timei < 240000000:
...
  timei = timei + timeval
...

We now have enough information for generating the sound samples. Keep in mind that each time period is broken down in two halves. In the first half our pulse have a positive value and in the second half our pulse have a negative value. For this reason it makes sense to work with half of the time period value, obtained by shifting the time value right by one bit:

...
while timei < 240000000:
...
  timeval = timeval >> 1


We would like to create sound samples at a rate of 48KHz, giving us the following code:

...
  timeval48khzfloat = float(timeval) * 48000/1000000
  timeval48khzint = int(timeval48khzfloat)
  for x in range (timeval48khzint):
    resfile.write(struct.pack('h',32000))
  for x in range (timeval48khzint):
    resfile.write(struct.pack('h',-32000))
...

This code will generate the sound samples for us. Finally, we just need to close the file when we are done:

resfile.close()

Listening to the result

I took the samples and converted it to a mp3 with the help of Audacity.

Unfortunately, since I use Blogger for hosting my posts, there is not a easy way to embed sound clips within posts. So I had to create a video from the mp3 and upload it to Youtube so everyone can listen to the end result.

It is perhaps advisable to tune down on the volume when listening to this, since there is some tones that can be annoying to the ear:

It sounds more or less as I remember it when I listened a couple of decades ago on a tape deck to C64 tape. Perhaps the leading mono-tone sounds too pure compared to the tape player of the day.

In Summary

In this post we have started to investigate how to integrate tape loading functionality to our C64 module.

As a nostalgic exercise, we attempted to reproduce the sound of a .TAP file as we remember it long time ago.

I performed this exercise on a PC with Python and Audacity.

It would be interesting to see if this exercise can be performed on a Zybo board, taking the .TAP file and generating the sound samples in real time within the FPGA and outputting the sound to a speaker, via the Line Out on the Zybo board.

My goal of generating sound from the .TAP file on a Zybo board perhaps sounds a bit over the top and unnecessary, but it can be an opportunity to learn how to use sound on the Zybo board. This knowledge be valuable if we later decide to also incorporate a SID within our C64 module.

So, in the next post we will attempt to generate sound on the Zybo board.

Till next time!