C64 on an FPGA: Integrating Tape Interface with C64 Module: Part 2

Foreword

In the previous post we managed to integrate our tape module into our existing C64 design and verified that our Tape module produced pulses of the correct widths given a particular .TAP file stored in SDRAM.

At this point in time, however, our Tape module is not wired to our C64 module so that it can do something useful for us. By useful I am mean loading the .TAP file into C64 so we can play the game these pulse widths represents 😃

Of course, if we had a fully developed C64 module, we could implement the functionality mentioned in the previous paragraph by simply taking the output of our Tape Module and connecting it to the FLAG input of a CIA#1 module and there you go!

But, in our case we don't have a fully developed C64 module. In this series of blog posts we are adding functionality in an incremental fashion and at this point we don't have a fully functional CIA module.

It is therefore the purpose of this post and a one or two subsequent posts to develop a fully functional CIA module. With this module developed, it will just be a matter of taking our Tape module and Plug & Play.

In the process of developing a CIA module, I will be doing a bit of code refactoring, moving existing code from the C64 module to the CIA module and into other new modules.

In this whole refactoring exercise, we will potentially introduce many bugs and it may be worthwhile to make use of simulation again to make it easier ironing out these bugs.

You might remember from some previous posts, the process of simulating the booting of a C64 system within a verilog simulator can be quite a time consuming process. You can wait between 20 and 30 minutes till you get to the point where the welcome message is written to screen memory.

This waiting can be quite a nuisance if you want to fix something small and just would like to test if the fixed worked.

I will show you some techniques on how you can drastically trim down on simulation wait time, to make life less frustrating.

Directives

You might have heard about C pre-processor directives, where you make use of defines that gets expanded before compile time.

In Verilog we also can make use of directives. I will be making use of directives in this post so we configure our C64 module to run in simulation mode and in "real" mode.

Directives just makes live easier so simulation mode code and normal code can live together, and you don't need to make so many changes each time between switching modes.

We will start by adding the following define right in the beginning of our C64 module:

`define SIM

This define signals that we want to run in simulation mode. Should we want to disable this mode, you can just comment this line out.

The first place where we would use this define, would be where we create an instance of our CPU:

`ifdef SIM
    cpu mycpu ( clk_1_mhz, proc_rst, addr, combined_d_out, ram_in, we, int_occ/*0*/, 1'b0, 1'b1 );
`else
    cpu mycpu ( clk_1_mhz, c64_reset, addr, combined_d_out, ram_in, we, int_occ/*0*/, 1'b0, 1'b1 );
`endif

Notice that the only difference between the two declaration is in the second port. In simulation mode we connect this port to proc_reset and in the other mode on c64_reset.

c64_reset is a port we have defined previously on our VIC-II module. The problem with this port is that it is a very time consuming process, cycle wise. Connecting to this port during simulation will indeed cause our simulation to take very long to complete.

For simulation we therefor connect to proc_rst, which is an input port on our C64 module. When simulating we can create a top module and connect the proc_rst port to a simulated reset with a much shorter duration.

To make life easier, we can also disable the creating of a VIC-II module during simulation:

`ifndef SIM
vic_ii vic_inst
    (
        .clk_in(clk),
        .clk_counter(clk_div_counter),
        .clk_2_mhz(clk_2_mhz),
        .blank_signal(blank_signal),
        .frame_sync(frame_sync),
        .data_in({14'd4,vic_combined_d}),
        .c64_reset(c64_reset),
        .addr(vic_addr),
        .out_rgb(out_rgb)
        );
        
    burst_block burst_tst(
            .clk(axi_clk_in),
            .reset(proc_rst),
            .write(write_pin),
            .next_frame(frame_sync),
            .write_data({pixel_16_bit_delay,pixel_16_bit}), 
            .count_in_buf(count_in_buf),
            //output src ready
            //-----------------------------------------
            .ip2bus_mst_addr(ip2bus_mst_addr),
            .ip2bus_mst_length(ip2bus_mst_length),
            .ip2bus_mstwr_d(ip2bus_mstwr_d),
            .ip2bus_inputs(ip2bus_inputs),
            .ip2bus_otputs(ip2bus_otputs),
            .read(read)
              );
`endif

Notice this time we make use of ifndef, meaning if not defined. So these two blocks will only be added if we are not in simulation mode.

You will also see that I am removing the burst_block, required for AXI memory access, when doing simulation.

Moving keyboard functionality into its own module

In our C64 module's current state, we some entangled keyboard functionality and CIA functionality.

It make sense to split this functionality and will also mark the start of our new CIA module.

Let us start with a keyboard module:

`timescale 1ns / 1ps

module keyboard(
  input [31:0] key_matrix_0,
  input [31:0] key_matrix_1,
  input [7:0] keyboard_control_byte,
  output [7:0] keyboard_result_byte
    );
    
    wire [7:0] keyboard_row_0;
    wire [7:0] keyboard_row_1;
    wire [7:0] keyboard_row_2;
    wire [7:0] keyboard_row_3;
    wire [7:0] keyboard_row_4;
    wire [7:0] keyboard_row_5;
    wire [7:0] keyboard_row_6;
    wire [7:0] keyboard_row_7;     

    assign keyboard_row_0 = key_matrix_0[7:0];
    assign keyboard_row_1 = key_matrix_0[15:8];
    assign keyboard_row_2 = key_matrix_0[23:16];
    assign keyboard_row_3 = key_matrix_0[31:24];
    assign keyboard_row_4 = key_matrix_1[7:0];
    assign keyboard_row_5 = key_matrix_1[15:8];
    assign keyboard_row_6 = key_matrix_1[23:16];
    assign keyboard_row_7 = key_matrix_1[31:24];
    
    assign keyboard_result_byte = ~((~keyboard_control_byte[0] ? keyboard_row_0 : 0) |           
                                   (~keyboard_control_byte[1] ? keyboard_row_1 : 0) |
                                   (~keyboard_control_byte[2] ? keyboard_row_2 : 0) |
                                   (~keyboard_control_byte[3] ? keyboard_row_3 : 0) |
                                   (~keyboard_control_byte[4] ? keyboard_row_4 : 0) |
                                   (~keyboard_control_byte[5] ? keyboard_row_5 : 0) |
                                   (~keyboard_control_byte[6] ? keyboard_row_6 : 0) |
                                   (~keyboard_control_byte[7] ? keyboard_row_7 : 0));
    
endmodule

The code almost looks identical to the old code, except that we move the code into its own module.

Let us quickly look at the port into more detail:

key_matrix_0 and key_matrix_1: This corresponds to slv_reg_0 and slv_reg_1 which our ARM processor would set a particular key within the keyboard matrix.
keyboard_control_byte: This byte will be provided by the Port_A output of CIA#1
keyboard_result_byte: This will be fed back to CIA#1 via Port_B

Creating the CIA module

Let us now move unto the creation of the CIA module. Its code look as follows:

`timescale 1ns / 1ps

module cia(
  output [7:0] port_a_out,
  input [7:0] port_b_in,
  input [3:0] addr,
  input we,
  input clk,
  input [7:0] data_in,
  output reg [7:0] data_out
    );
    
  reg [7:0] slave_reg_0 = 0;
  reg [7:0] slave_reg_1 = 0;
  reg [7:0] slave_reg_2 = 0;
  reg [7:0] slave_reg_3 = 0;
  
  assign port_a_out = slave_reg_0;
  
  always @(posedge clk)
  if(!we)
  case (addr)
    0: data_out <= slave_reg_0;
    1: data_out <= port_b_in;
    2: data_out <= slave_reg_2;
    3: data_out <= slave_reg_3;
  endcase
  
  always @(posedge clk)
  if (we)
  case (addr)
    0: slave_reg_0 <= data_in;
    1: slave_reg_1 <= data_in;
    2: slave_reg_2 <= data_in;
    3: slave_reg_3 <= data_in;
  endcase
endmodule

Let us look into the ports:

port_a_out: This is the Port A ouput and will be fed to the keyboard module
port_b_in: This port will receive the keyboard result byte from the keyboard module
addr: Joined to the the address output of the CPU. Note we are only using the lower three bits since the CIA only have 16 registers.
we: Write enable. set by the CPU if it wants to write something to one of the CIA registers
clk: 1 Mhz clock
data_in : data from the cpu
data_out: Data from the CIA to CPU.

You will also see that we define a set of slave registers. The CIA have sixteen, but for now we have only defined 4 of them.

We also defined some functionality for reading and writing to these registers.

We have also linked up Port_A and Port_B.

Wiring everything up

Let us wire our two new modules up in the C64 Module:

...
    keyboard key_inst(
      .key_matrix_0(slave_0_reg),
      .key_matrix_1(slave_1_reg),
      .keyboard_control_byte(keyboard_control),
      .keyboard_result_byte(keyboard_result)
        );
        
    cia cia_1(
          .port_a_out(keyboard_control),
          .port_b_in(keyboard_result),
          .addr(addr[3:0]),
          .we(we & (addr[15:8] == 8'hdc)),
          .clk(clk_1_mhz),
          .data_in(ram_in),
          .data_out(cia_1_data_out)
            );
...
    always @*
        casex (addr_delayed)
          16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
          16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
          16'hd012: combined_d_out = line_counter;
          16'hdcxx: combined_d_out = cia_1_data_out;
          default: combined_d_out = ram_out;
        endcase

The port assignment is pretty straightforward. I just want to mention that the assignment of we on cia_1, we only set if the address starts with DC.

Similarly, combined_d_out, the data to the CPU, we send the data output of CIA_1 if address starts with DC.

Finally, we need to create a top module for testing our C64 module in simulation mode:

module top(

    );
    
reg clk = 0;
reg reset = 1;
    
block_test my_c64(
      .clk(clk),      
      .proc_rst(reset),
      .slave_0_reg(1),
      .slave_1_reg(0)
        );

always #5 clk = ~clk;

initial begin
#100 reset = 0;
end    
endmodule

Here we do a very brief reset.

Also in the slave registers we only asserts one key.

Testing our new modules

Time to test our new modules.

As mentioned earlier, we can test our modules by booting our normals ROM's, but this would be too time consuming in a simulation.

We speed things up by writing a simple 6502 assembly test program. We will put this code in a copy of kernel ROM.

We start off by looking at the end of our kernel.hex file:

05
E5
4C
0A
E5
4C
00
E5
52
52
42
59
43
FE
E2
FC
48
FF

I have highlighted the reset vector, which is currently FCE2. The idea is to put our test program towards the end of the kernel ROM, so we will need to adjust the reset vector accordingly.

Here is the test program:

LDA #$FE
STA $DC00
LDA $DC01

So, we put the value FE on port a of CIA#1. Since we are activating the first key in the key matrix in our top module, we are expecting to read back value FE from port B.

We modify the last part of kernel ROM as follows:

A9
FE
8D
00
DC
AD
01
DC
00
00
00
00
F0
FF
48
FF

Here our program starts at address FFF0 and I have adjusted the reset vector accordingly.

In our c64 module we can again make use of directives to make the switching between simulation and normal mode easier:

...
    `define SIM
    `ifdef SIM
      `define kernel_file "/home/johan/roms/kernel_debug.hex"
    `else
      `define kernel_file "/home/johan/Documents/roms/kernel.hex"
    `endif
...
    rom #(
         .ROM_FILE(`kernel_file)
        ) kernel(
          .clk(clk_1_mhz),
          .addr(addr[12:0]),
          .rom_out(kernel_out)
            );
...

So, for simulation, we use the file kernel_debug.hex, that contain our test program. This avoids copying and pasting roms around everytime when we want to switch to simulation mode.

A simulation run

When we run a simulation the result is the following:

The addr field is the addresses the CPU issues. Below the addr field is the data the CPU receives for the relevant addresses.

From the addresses we can see our program gets eventually executed at FFF0.

Marked by the arrows we see at one stage we issue a read for address DC01 and we get the value FE. This is what we expect.

In Summary

In this post we started to develop a full blown CIA block, so that we, in a later post be able to load a .TAP file and execute within our C64 module.

In this post we implemented enough functionality for the CIA for simulating keyboard access.

In the next post we will implement timers within our CIA.

Till next time!

C64 on an FPGA

Monday, 3 June 2019

Integrating Tape Interface with C64 Module: Part 2