Wednesday, 13 December 2017

Booting the C64 System

Foreword

In the previous post we managed to successfully run Klaus Dormann's Test Suite on the Zybo Board.

In this post we will extend our implementation to boot the C64 system.

At the end of this post we will run the resulting implementation in a simulator and in the next post we will get to running it on the ZYBO board.

Adding the C64 ROMS

In order to boot the C64 system we need to add the two ROMS, e.g. BASIC and KERNEL to our design.

The process would be more or less the same as we did with adding the TestSuite binary in previous posts.

Since we are working with ROMS, however, we will only be adding logic to read data from the Block RAM and no write logic.

Since we are dealing with two ROMS and in later posts three ROMS when adding the Chargen ROM, it make sense to extract the common logic into a module of its own. The signature of this module will look as follows:

module rom#(
 parameter ADDR_WIDTH = 13,
 parameter ROM_FILE = ""

)

(
  input clk,
  input wire [ADDR_WIDTH-1:0] addr,
  output reg [7:0] rom_out
    )


You will notice our signature contains an extra section preceded by a hash, which is a style we haven't use before.

The hash section is basically a parameter section, declaring parameters with default values. The nice thing about these parameters is that you can override these values when you create a module instance with suitable values.

In the parameter section of our rom module we have the parameter ADDR_WIDTH with a a default value of 13.  This means that if you instantiate a rom module instance and you don't override the ADDR_WIDTH parameter, your resulting instance can accept addresses of maximum 13 bits.

13 bits gives us 8KB of addressable space. This default is sufficient for both the BASIC ROM and the KERNEL.

In later posts, however, where we will be adding the CharROM which is only 4KB we will need to override the ADDR_WIDTH with a value of 12.

Let us now look at the meat of our rom module:

reg [7:0] rom[2**ADDR_WIDTH-1:0];

 always @ (posedge clk)
    begin
      rom_out <= rom[addr];
    end 

    
initial begin
      $readmemh(ROM_FILE, rom) ;
    end    


We begin by defining an array that will contain the contents for the applicable ROM. In defining the size of the array we make use of the ADDR_WIDTH parameter defined previously.

We populate the contents of this array with an initial block similarly as we did in a previous post.

We define a always block for pushing the contents for given address to an output register on the positive transition of the clock pulse.

With our rom module defined, we can now create some instances of it in our main module:

rom #(
 .ROM_FILE("/home/johan/Documents/roms/kernel.hex")
) kernel(
  .clk(clk),
  .addr(addr[12:0]),
  .rom_out(kernel_out)
    );

rom #(
 .ROM_FILE("/home/johan/Documents/roms/basic.hex")
) basic(
  .clk(clk),
  .addr(addr[12:0]),
  .rom_out(basic_out)
    );


For both instances we send as paramater the location to a hex formatted file containing the content for applicable ROM.

For the address we send through the least 13 bits of the address bus.

We are missing some arbitration logic that will ensure, depending on the given address whether we return the contents of the BASIC ROM, KERNEL or our 64KB RAM.

Adding Arbitration Logic

The logic for performing arbitration is as follows:

...
reg [7:0] combined_d_out;
...
always @*
  casex (addr)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    default: combined_d_out = ram_out;
  endcase
...

The function of this logic can be represented in a diagram as follows:


All our storage elements, BASIC, KERNEL and our RAM gets fed to a multiplexer and we use the address as selector to decide which one gets send to the DI input of the 6502 CPU.

Let us now look at our piece of Verilog code in more detail. This will indeed look familiar to programmers as a case/switch statement.

This case statement, however, starts with casex instead of case. This is a special kind of Verilog statement, where in the selector you can specify Don't care values.

A don't care value you sepcify with an X, and means that this position can be any value.

Strictly speaking, if you look at our case statement, you could have only connected only the most significant three bits to our case statement, since the lower thirteen doesn't serve any purpose. But, as you will see later, we will need to full addresses for a scenario where will check for a specific address.

One thing we haven't consider in our design is the way Block RAMS work. Block RAMS only show the output a clock pulse after the address is asserted. In our design, however, we are multiplexing one clock cycle to early, meaning that by the time the data is ready, we might have switched that block rom out of view with the next address.

The solution would be to delay address input also by one clock cycle. This will result into the following changes:

...
reg [15:0] addr_delayed;
...
 always @ (posedge clk)
    addr_delayed <= addr;
...
always @*
  casex (addr_delayed)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    default: combined_d_out = ram_out;
  endcase
...


Preparing for Simulation

All our for our 6502 system is currently wrapped in module called c64_core that is contained in Design sources, used for performing synthesis.

We also have a similar module within our simulation sources containing code for assisting a simulation.

With this current setup you would develop in the copy contained in simulation sources, making it is easy run a simulation now and again to check if you are on the right track.

Once finished with your development though, you would need to copy your changes to c64_core in Design sources.

This copy and pasting can be quite error prone. A better approach would be to let both the design and simulation sources share the same c64_core module. Then, within the simulation sources you create a top module surrounding the c64_core module. This top module would then contain all the simulation specific code.

Let us start with this top module. First, let us look again at the signature of c64_core module:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  output wire [15:0] addr_out
    );


The resulting top module is quite simple:

reg clk = 0;
reg reset = 1;
wire [15:0] addr_out;

c64_core my_core(
    .clk_in(clk),
    .rst(reset),
    .debug_clk(1'b0),
    .debug_mode(1'b0),
    .addr_out(addr_out),
        );

always #10
clk <= ~clk;        

initial begin
  #100 reset <= 0;
  #100000000 $finish;
end    


First Simulation Attempt

With our first simulation our Wave output looks as follows:


If you go through the address requests of addr_out, you will see that the last couple of address requests ranges between ff5e-ff63. If you look at Disassembly listing of the kernel, you will see these addresses corresponds to the following:

FF5E   AD 12 D0   LDA $D012
FF61   D0 FB      BNE $FF5E

This loop rings a clear bell from my previous blogs where I wrote emulators for other platforms. Writing a C64 from scratch, you will most probably always got stuck at this loop for the first time.

This signals good news, since we are on the right track.

What we need to do next, is imitate values for register D012 (which is a VIC-II register) , so we can get past above loop, and see if screen memory get populated with the C64 startup message.

Getting past the $FF5E loop

To get past the $FF5E loop we can just link the memory register to a binary counter counting up at each clock cycle.

The implementation of the binary counter is as simple as follows:

...
reg [7:0] line_counter;
...
always @(posedge clk)
  if (rst)
    line_counter <= 0;
  else
    line_counter <= line_counter + 1;
...

And finally we change our arbitration block:

always @*
  casex (addr_delayed)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    16'hd012: combined_d_out = line_counter;
    default: combined_d_out = ram_out;
  endcase


When our run simulation again with above changes, our wave output looks as follows:


If you now compare these addresses to a disassembly listing again, you will get to the following section:

; wait for return for keyboard
E5CA   20 16 E7   JSR $E716
E5CD   A5 C6      LDA $C6
E5CF   85 CC      STA $CC
E5D1   8D 92 02   STA $0292
E5D4   F0 F7      BEQ $E5CD
E5D6   78         SEI
E5D7   A5 CF      LDA $CF
E5D9   F0 0C      BEQ $E5E7
E5DB   A5 CE      LDA $CE
E5DD   AE 87 02   LDX $0287

I got this dissasemmbly listing from ffd2.com

From this we can gather that our simulation got to the point where it is waiting for keyboard input, which just after C64 bootup.

Ok, I am pretty convinced the C64 boot process went fine, but I am itching to check one more thing: Checking whether screen memory at memory location 1024 is populated with the Welcome message.

Checking Screen memory for welcome message

As our FPGA implementation is at the moment, we don't really have a way to inspect the contents of our 64KB RAM. We therefore need to modify our debug mode functionality to return the information we want.

Firstly, let us start to modify the header of our c64_core module for returning the relevant information:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  input wire [15:0] addr_in
  output wire [7:0] data_out
    )

We have change our addr_out to addr_in and added an output wire returning data for requested address.

Next thing we should do, is to disconnect our cpu from any clock once our core turns into debug mode. We do this by introducing an extra clocking wire for our CPU:

...
wire cpu_clk;
...
assign cpu_clk = debug_mode ? 1b'0 : clk_in;
...
cpu mycpu ( cpu_clk, rst, addr, combined_d_out, ram_in, WE, 1'b0, 1'b0, 1'b1 );
...

Next up, it is important to give our RAM logic the ability to get an address from two sources, depending on whether debug mode is selected:

...
wire [15:0] addr_ram_in;
...
assign addr_ram_in = debug_mode ? addr_in : addr;
...
assign data_out = ram_out;
...
 always @ (posedge clk)
    begin
     if (WE) 
     begin
      ram[addr] <= ram_in;
      ram_out <= ram_in;
     end
     else 
     begin
      ram_out <= ram[addr_ram_in];
     end 
    end 
...

We are done with our changes within c64_core. Next we some make some modifications to the top_module for our simulation.

First some declaration changes:

...
reg [15:0] index;
wire [7:0] d_out;    
..    
c64_core my_core(
    .clk_in(clk),
    .rst(reset),
    .debug_clk(clk),
    .debug_mode(1'b0),
    .addr_in(index),
    .data_out(d_out)

        );


The index register I have defined will updated by a loop which I will discuss shortly.

We end off by modifying our initial block for our simulation:

initial begin
  #100 reset <= 0;
  #100000000 
  #20 debug_mode < 1;
  for (index=1024; index<1500; index = index +1 ) 
  begin
    #20 $display("%d",d_out);    
  end  
  $finish;
end    

We have added a for-loop. For-loops are provided in Verilog to aid in simulation. I have read a couple of sources stating that a for-loop will indeed synthesise to something on an FPGA, but the end result would not be necessary the result that you want. So the golden rule: Only use for-loops in simulations.

In our for-loop we keep increment the register index from 1024 till it reaches 1500. Each time, within the for loop, we wait 20 simulation periods (defined by #20) . This have the effect of executing our for-loop once every clock cycle.

Within our for-loop we have also introduced a new simulation directive called $display. It works very similar to printf in c. In our case we actually outputs the value of d_out at each increment. This loop will in effect output the first half of screen memory to the console.

When running the simulation with our changes, the output of the Tcl console will look as follows:


The output starts with a train of 32's, which is a space if you look at the screencode table. This looks promising. Scrolling down we do eventually see some signs of a message:


Converting these screencodes to the actual characters yield the following:

42 = *
42 = *
42 = *
42 = *
32 = SPACE
3  = C
15 = O
13 = M
13 = M
15 = O
4  = D
15 = O
18 = R
5  = E


This is exactly the first part of the C64 welcome message.

We can conclude our simulation went ok up the point of showing the welcome message.

In Summary

In this post we managed to successfully run a simulation for booting the C64 system and populating screen memory with the welcome message.

In the next post we will attempt to run the C64 boot process on the ZYBO board itself.

Till next time!

2 comments:

  1. This is really a very good article. Thanks for taking the time to discuss with us, I feel happy about learning this topic. keep sharing your information regularly for my future reference.
    Web Development Company Pune

    ReplyDelete