Wednesday, 13 December 2017

Booting the C64 System

Foreword

In the previous post we managed to successfully run Klaus Dormann's Test Suite on the Zybo Board.

In this post we will extend our implementation to boot the C64 system.

At the end of this post we will run the resulting implementation in a simulator and in the next post we will get to running it on the ZYBO board.

Adding the C64 ROMS

In order to boot the C64 system we need to add the two ROMS, e.g. BASIC and KERNEL to our design.

The process would be more or less the same as we did with adding the TestSuite binary in previous posts.

Since we are working with ROMS, however, we will only be adding logic to read data from the Block RAM and no write logic.

Since we are dealing with two ROMS and in later posts three ROMS when adding the Chargen ROM, it make sense to extract the common logic into a module of its own. The signature of this module will look as follows:

module rom#(
 parameter ADDR_WIDTH = 13,
 parameter ROM_FILE = ""

)

(
  input clk,
  input wire [ADDR_WIDTH-1:0] addr,
  output reg [7:0] rom_out
    )


You will notice our signature contains an extra section preceded by a hash, which is a style we haven't use before.

The hash section is basically a parameter section, declaring parameters with default values. The nice thing about these parameters is that you can override these values when you create a module instance with suitable values.

In the parameter section of our rom module we have the parameter ADDR_WIDTH with a a default value of 13.  This means that if you instantiate a rom module instance and you don't override the ADDR_WIDTH parameter, your resulting instance can accept addresses of maximum 13 bits.

13 bits gives us 8KB of addressable space. This default is sufficient for both the BASIC ROM and the KERNEL.

In later posts, however, where we will be adding the CharROM which is only 4KB we will need to override the ADDR_WIDTH with a value of 12.

Let us now look at the meat of our rom module:

reg [7:0] rom[2**ADDR_WIDTH-1:0];

 always @ (posedge clk)
    begin
      rom_out <= rom[addr];
    end 

    
initial begin
      $readmemh(ROM_FILE, rom) ;
    end    


We begin by defining an array that will contain the contents for the applicable ROM. In defining the size of the array we make use of the ADDR_WIDTH parameter defined previously.

We populate the contents of this array with an initial block similarly as we did in a previous post.

We define a always block for pushing the contents for given address to an output register on the positive transition of the clock pulse.

With our rom module defined, we can now create some instances of it in our main module:

rom #(
 .ROM_FILE("/home/johan/Documents/roms/kernel.hex")
) kernel(
  .clk(clk),
  .addr(addr[12:0]),
  .rom_out(kernel_out)
    );

rom #(
 .ROM_FILE("/home/johan/Documents/roms/basic.hex")
) basic(
  .clk(clk),
  .addr(addr[12:0]),
  .rom_out(basic_out)
    );


For both instances we send as paramater the location to a hex formatted file containing the content for applicable ROM.

For the address we send through the least 13 bits of the address bus.

We are missing some arbitration logic that will ensure, depending on the given address whether we return the contents of the BASIC ROM, KERNEL or our 64KB RAM.

Adding Arbitration Logic

The logic for performing arbitration is as follows:

...
reg [7:0] combined_d_out;
...
always @*
  casex (addr)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    default: combined_d_out = ram_out;
  endcase
...

The function of this logic can be represented in a diagram as follows:


All our storage elements, BASIC, KERNEL and our RAM gets fed to a multiplexer and we use the address as selector to decide which one gets send to the DI input of the 6502 CPU.

Let us now look at our piece of Verilog code in more detail. This will indeed look familiar to programmers as a case/switch statement.

This case statement, however, starts with casex instead of case. This is a special kind of Verilog statement, where in the selector you can specify Don't care values.

A don't care value you sepcify with an X, and means that this position can be any value.

Strictly speaking, if you look at our case statement, you could have only connected only the most significant three bits to our case statement, since the lower thirteen doesn't serve any purpose. But, as you will see later, we will need to full addresses for a scenario where will check for a specific address.

One thing we haven't consider in our design is the way Block RAMS work. Block RAMS only show the output a clock pulse after the address is asserted. In our design, however, we are multiplexing one clock cycle to early, meaning that by the time the data is ready, we might have switched that block rom out of view with the next address.

The solution would be to delay address input also by one clock cycle. This will result into the following changes:

...
reg [15:0] addr_delayed;
...
 always @ (posedge clk)
    addr_delayed <= addr;
...
always @*
  casex (addr_delayed)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    default: combined_d_out = ram_out;
  endcase
...


Preparing for Simulation

All our for our 6502 system is currently wrapped in module called c64_core that is contained in Design sources, used for performing synthesis.

We also have a similar module within our simulation sources containing code for assisting a simulation.

With this current setup you would develop in the copy contained in simulation sources, making it is easy run a simulation now and again to check if you are on the right track.

Once finished with your development though, you would need to copy your changes to c64_core in Design sources.

This copy and pasting can be quite error prone. A better approach would be to let both the design and simulation sources share the same c64_core module. Then, within the simulation sources you create a top module surrounding the c64_core module. This top module would then contain all the simulation specific code.

Let us start with this top module. First, let us look again at the signature of c64_core module:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  output wire [15:0] addr_out
    );


The resulting top module is quite simple:

reg clk = 0;
reg reset = 1;
wire [15:0] addr_out;

c64_core my_core(
    .clk_in(clk),
    .rst(reset),
    .debug_clk(1'b0),
    .debug_mode(1'b0),
    .addr_out(addr_out),
        );

always #10
clk <= ~clk;        

initial begin
  #100 reset <= 0;
  #100000000 $finish;
end    


First Simulation Attempt

With our first simulation our Wave output looks as follows:


If you go through the address requests of addr_out, you will see that the last couple of address requests ranges between ff5e-ff63. If you look at Disassembly listing of the kernel, you will see these addresses corresponds to the following:

FF5E   AD 12 D0   LDA $D012
FF61   D0 FB      BNE $FF5E

This loop rings a clear bell from my previous blogs where I wrote emulators for other platforms. Writing a C64 from scratch, you will most probably always got stuck at this loop for the first time.

This signals good news, since we are on the right track.

What we need to do next, is imitate values for register D012 (which is a VIC-II register) , so we can get past above loop, and see if screen memory get populated with the C64 startup message.

Getting past the $FF5E loop

To get past the $FF5E loop we can just link the memory register to a binary counter counting up at each clock cycle.

The implementation of the binary counter is as simple as follows:

...
reg [7:0] line_counter;
...
always @(posedge clk)
  if (rst)
    line_counter <= 0;
  else
    line_counter <= line_counter + 1;
...

And finally we change our arbitration block:

always @*
  casex (addr_delayed)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    16'hd012: combined_d_out = line_counter;
    default: combined_d_out = ram_out;
  endcase


When our run simulation again with above changes, our wave output looks as follows:


If you now compare these addresses to a disassembly listing again, you will get to the following section:

; wait for return for keyboard
E5CA   20 16 E7   JSR $E716
E5CD   A5 C6      LDA $C6
E5CF   85 CC      STA $CC
E5D1   8D 92 02   STA $0292
E5D4   F0 F7      BEQ $E5CD
E5D6   78         SEI
E5D7   A5 CF      LDA $CF
E5D9   F0 0C      BEQ $E5E7
E5DB   A5 CE      LDA $CE
E5DD   AE 87 02   LDX $0287

I got this dissasemmbly listing from ffd2.com

From this we can gather that our simulation got to the point where it is waiting for keyboard input, which just after C64 bootup.

Ok, I am pretty convinced the C64 boot process went fine, but I am itching to check one more thing: Checking whether screen memory at memory location 1024 is populated with the Welcome message.

Checking Screen memory for welcome message

As our FPGA implementation is at the moment, we don't really have a way to inspect the contents of our 64KB RAM. We therefore need to modify our debug mode functionality to return the information we want.

Firstly, let us start to modify the header of our c64_core module for returning the relevant information:

module c64_core(
  input wire clk_in,
  input wire reset,
  input wire debug_clk,
  input wire debug_mode,
  input wire [15:0] addr_in
  output wire [7:0] data_out
    )

We have change our addr_out to addr_in and added an output wire returning data for requested address.

Next thing we should do, is to disconnect our cpu from any clock once our core turns into debug mode. We do this by introducing an extra clocking wire for our CPU:

...
wire cpu_clk;
...
assign cpu_clk = debug_mode ? 1b'0 : clk_in;
...
cpu mycpu ( cpu_clk, rst, addr, combined_d_out, ram_in, WE, 1'b0, 1'b0, 1'b1 );
...

Next up, it is important to give our RAM logic the ability to get an address from two sources, depending on whether debug mode is selected:

...
wire [15:0] addr_ram_in;
...
assign addr_ram_in = debug_mode ? addr_in : addr;
...
assign data_out = ram_out;
...
 always @ (posedge clk)
    begin
     if (WE) 
     begin
      ram[addr] <= ram_in;
      ram_out <= ram_in;
     end
     else 
     begin
      ram_out <= ram[addr_ram_in];
     end 
    end 
...

We are done with our changes within c64_core. Next we some make some modifications to the top_module for our simulation.

First some declaration changes:

...
reg [15:0] index;
wire [7:0] d_out;    
..    
c64_core my_core(
    .clk_in(clk),
    .rst(reset),
    .debug_clk(clk),
    .debug_mode(1'b0),
    .addr_in(index),
    .data_out(d_out)

        );


The index register I have defined will updated by a loop which I will discuss shortly.

We end off by modifying our initial block for our simulation:

initial begin
  #100 reset <= 0;
  #100000000 
  #20 debug_mode < 1;
  for (index=1024; index<1500; index = index +1 ) 
  begin
    #20 $display("%d",d_out);    
  end  
  $finish;
end    

We have added a for-loop. For-loops are provided in Verilog to aid in simulation. I have read a couple of sources stating that a for-loop will indeed synthesise to something on an FPGA, but the end result would not be necessary the result that you want. So the golden rule: Only use for-loops in simulations.

In our for-loop we keep increment the register index from 1024 till it reaches 1500. Each time, within the for loop, we wait 20 simulation periods (defined by #20) . This have the effect of executing our for-loop once every clock cycle.

Within our for-loop we have also introduced a new simulation directive called $display. It works very similar to printf in c. In our case we actually outputs the value of d_out at each increment. This loop will in effect output the first half of screen memory to the console.

When running the simulation with our changes, the output of the Tcl console will look as follows:


The output starts with a train of 32's, which is a space if you look at the screencode table. This looks promising. Scrolling down we do eventually see some signs of a message:


Converting these screencodes to the actual characters yield the following:

42 = *
42 = *
42 = *
42 = *
32 = SPACE
3  = C
15 = O
13 = M
13 = M
15 = O
4  = D
15 = O
18 = R
5  = E


This is exactly the first part of the C64 welcome message.

We can conclude our simulation went ok up the point of showing the welcome message.

In Summary

In this post we managed to successfully run a simulation for booting the C64 system and populating screen memory with the welcome message.

In the next post we will attempt to run the C64 boot process on the ZYBO board itself.

Till next time!

11 comments:

  1. This is really a very good article. Thanks for taking the time to discuss with us, I feel happy about learning this topic. keep sharing your information regularly for my future reference.
    Web Development Company Pune

    ReplyDelete
  2. Hi, I was hoping to get a little help on this step. I'm at the point in this blog post called "First Simulation Attempt". Unfortunately I can't include a picture of my waveform in this comment. But during execution of the Kernal Rom, The reset vector takes my execution from FFFC, FFFD to FCE2, FCE3, FCE4, FCE5, FCE6, FCE7, FCE8, 01FF, 01FE, FCE9, FD02, FD03, FD04, FD05, FD06, FD14, FD07, FD08, FD09, 8008, FD0A, FD0B, FD0C, FD0D and just stays here endlessly FD0D. I don't get stuck at the FF5E like you do, which makes me believe that the line counter won't work for me.



    I thought maybe that there were different Kernal roms you were using, but I tried 3 different versions all with the same result. I got the binary roms from www.zimmers.net/anonftp/pub/cbm/firmware/computers/c64/ and I tried the 3 most common ones: kernal.901227-03.bin, kernal.901227-02.bin, and kernal.901227-01.bin I took the bin file, opened in HxD hexeditor and copied and pasted the hex values replacing the 'spaces' with /n just like we did with the Klaus Test in previous posts.

    Thanks for the help!

    ReplyDelete
    Replies
    1. Hi Eric

      I would suggest adding the data_in port of the 6502 core to your waveform. Then, check the values that gets read back for the applicable addresses starting at address FD02.

      In this exercise have a disassembled version of the KERNEL ROM at hand, typically like this: http://www.ffd2.com/fridge/docs/c64-diss.html

      Adress FD0D is the beginning of a branch instruction (BNE with opcode D0). With this the 6502 should at least have read the byte at address FD0E.

      This tells me that maybe for address FD0D some value other than D0 is return to the CPU.

      Delete
  3. I found the disassembly listing for the kernel at http://www.ffd2.com/fridge/docs/c64-diss.html
    the site is a bit old and I couldn't find direct links from ffd2.com but I found it indirectly through google

    It looks like I am stuck in the execution at FD0D of:
    FD02 A2 05 LDX #$05
    FD04 BD 0F FD LDA $FD0F,X
    FD07 DD 03 80 CMP $8003,X
    FD0A D0 03 BNE $FD0F
    FD0C CA DEX
    FD0D D0 F5 BNE $FD04
    FD0F 60 RTS

    I also double checked that indeed my rom had the correct data at these address, it definitely does and matches the kernel listing. So I must have another issue. I'll keep plugging away at it =)

    ReplyDelete
    Replies
    1. Just saw your reply after I submitted mine :-)

      As I mentioned in my previous reply, just add data_in to your waveform and confirm that when your waveform get to FD0D that value D0 is send to data_in

      Delete
  4. D0 is available on the data_in (DI) of the processor. This is where I'm stumped. The correct data is present, and my cpu was verified with the klaus suite. Here's a link to my waveform https://imgur.com/9TGaMIY

    I can't find anything in the log file either

    ReplyDelete
  5. Ok, I made some more progress. It turns out that the xilinx simulation/cpu/everything had a heart attack because the ram was initialized with XX (don't cares). I added an initial begin block in c64_core.v to run through a for loop and populate the ram with 0x00s as follows:

    integer i;
    initial begin
    for(i = 0; i < 65536; i = i + 1)
    begin
    ram[i] <= 0;
    // $readmemh("C:/WS_Vivado/6502_funct_test.dat", ram) ;
    end
    end

    This allowed me to eventually end up in the

    ; get character from keyboard buffer section

    E5B4 AC 77 02 LDY $0277
    E5B7 A2 00 LDX #$00
    E5B9 BD 78 02 LDA $0278,X
    E5BC 9D 77 02 STA $0277,X
    E5BF E8 INX
    E5C0 E4 C6 CPX $C6
    E5C2 D0 F5 BNE $E5B9
    E5C4 C6 C6 DEC $C6
    E5C6 98 TYA
    E5C7 58 CLI
    E5C8 18 CLC
    E5C9 60 RTS

    without having to implement the "Getting past the $FF5E loop" you have mentioned in this post.

    I keep running into:

    00cc
    e5d1, e5d2, e5d3, 0292, e5d4, e5d5, e5d6, e5cd, e5ce, 00c6 e5cf, e5d0
    00cc
    e5d1, e5d2, e5d3, 0292, e5d4, e5d5, e5d6, e5cd, e5ce

    that being said, I'm not 100% certain that everything is working well as I did a search in my waveform for address FF53 and the search failed to find this address. I guess for now, I'll just try to implement the rest and see if I can finally get to the welcome message as you have. Wish me luck as I think I'll need it. These posts have definitely re-kindled my xilinx/fpga love/hate relationship =) hahaha.

    Thank You for still checking up on these comments, I really appreciate it!

    ReplyDelete
    Replies
    1. You are welcome :-)

      I must say you are indeed making good progress.

      I see you ended up in the part where the emulator is waiting for the return key to be pressed, which means the boot went ok.

      I wouldn't worry too much about the FF5E loop. Just means in your setup, for some reason this location always return a zero.

      I am sure you will be fine with the welcome message if you got this far.

      Best thing is to not give up. There were many times in this blog series, where I felt like throwing in towel, but then I just take a break from this project for two days, then everything is fine again :-)

      Delete
  6. Got it Working!!! Thanks for the help!!
    Documenting all your work is such a huge undertaking.
    I got my first output:
    3-C,15-O,13-M,13-M,15-O,4-D,15-O,18-R,5-E,32-space,54-6,52-4,32-space,2-B,1-A,19-S,9-I,3-C,32-space,22-V,50-2

    ReplyDelete
  7. I just saw your previous comment after I posted mine, thanks for the kind words =)

    ReplyDelete