Wednesday 14 February 2018

Integrating VIC-II core with C64 core

Foreword

In the previous post we have developed the VIC-II core and managed to display some dummy data.

In this post we will be integrating this VIC-II core within our C64 core and see if it could render the C64 startup screen.

This rendering will be again performed within a simulation.

A Global overview

Since more than three weeks went by since my previous post, let me start off by doing a quick refresher on a couple of points.

In the previous post I have mentioned that the VIC-II and 6510 can share access to the main memory with 1MHz clock signal where the 6510 access memory on the rising edge of the clock and the VIC-II on the falling edge of the clock.

Occasionally the VIC-II would need some extra memory cycles to retrieve the character pointer data from screen memory. The VIC-II get these extra cycles by stealing it from the 6510 processor. Obviously, during this time, the 6510 cannot really do anything.

During this "cycle stealing", the VIC-II access the contents of main memory at a speed of 2MHz.

This memory model just described will not work for Block RAM on an FPGA. With Block RAM you can either access the RAM on the rising clock edge or the falling clock edge, but not both!

Luckily Block do allow simultaneous access via dual port mode. In dual port mode you can connect both the 6502 and VIC-II to separate port of the Block RAM, allowing them to access the Block RAM simultaneously.

We will need to clock our Block RAM at a speed of 2Mhz to cater for maximum read speed when the VIC-II steals cycles.

To make our CPU clock at 1Mhz while still been able to access the 2MHZ clocked memory, we should clock the CPU with the following clock signal:


The cpu clock is shown at the bottom, with the memory clock is shown for reference at the top. So, in other words, the 1 Mhz signal is achieved by skipping every second pulse.

Instantiating a VIC-II instance

Let us start off creating an VIC-II instance within our C64 core and see which ports we can wire up:

...
wire mem_clk;
wire c64_reset;
wire c64_clk;
...
    cpu mycpu ( c64_clk, c64_reset, addr, combined_d_out, ram_in, WE, 1'b0, 1'b0, 1'b1 );
...
vicii vic_inst(
  .addr(),
  .data(),
  .reset(rst),
  .clk(clk_in),
  .clk_out(mem_clk),
  .c64_reset(c64_reset),
  .clk_out_1_mhz(c64_clk),  
  .first_pixel(),
  .frame_sync(),
  .blank_signal(),
  .out_rgb()
  );
...

mem_clk is the new 2MHZ clock that will drive our block RAM.

c64_clk is the 1Mhz signal generated by the VIC-II.

You will notice that a new port is added to the vicii called c64_reset.

Why this port? Well, during the reset process our VIC-II core doesn't emit any clock signal on c64_clk. our CPU, however, needs a couple of clock cycles during reset to properly initialise itself.

To cater for this reset need of the CPU we have added the c64_reset port. Basically when the VIC-II is up and running, and therefore c64_clk pulsing, we apply a reset signal via c64_reset for a couple of clock pulses.

Here is the implementation for the reset signal, within the VIC-II core:

..
  output wire c64_reset,
...
  assign c64_reset = c64_system_reset_counter < 150 ? 1 : 0; 
...
  reg [7:0] c64_system_reset_counter;
...
  always @(posedge clk)
  if (reset)
    c64_system_reset_counter <= 0;
  else if (c64_system_reset_counter < 160)
    c64_system_reset_counter <= c64_system_reset_counter + 1; 
...

Changes to existing Block RAM instances

As mentioned earlier, there is a couple of changes we need to do to our existing Block RAM instances: Making them dual port and should be able to clock at 2MHz.

Let us first have a look at the 2MHz requirement.

With the CPU clocking at 1 MHZ, a 2MHz memory clock basically means that we have an extra memory clock pulse between two CPU cycles. This have the side effect that if the CPU puts an address read request during one CPU clock pulse, the data will already be available for consumption at the next CPU clock pulse.

While the above sounds like something good, this behaviour will actually break the assumptions the Arlet Core is build around concerning FPGA block RAM.

To solve the issue surrounding block ram timing assumption, we should add some delay elements in front of the inputs of the block RAM so that it behaves like a Block RAM clocking at 1MHz.

Our main RAM block element, with the delays added will look as follows:

...
reg WE_delayed;
reg [7:0] ram_in_delayed;
...
always @(posedge mem_clk)
  WE_delayed <= WE;
...
 always @ (posedge mem_clk)
    addr_synced <= addr;
...
 always @ (posedge mem_clk)
    begin
     if (WE_delayed) 
     begin
      ram[addr_synced] <= ram_in_delayed;
      ram_out <= ram_in_delayed;
     end
     else 
     begin
      ram_out <= ram[addr_synced];
     end 
    end 
...
always @(posedge mem_clk)
  ram_in_delayed <= ram_in;
...

As you can see, we delay the Write Enable signal, requested address and data tow write by one clock cycle, effectively eliminating the advantage of the 2MHz signal.

Finally, we also need to use the same signals for our ROM elements:

...
rom #(
 .ROM_FILE("/home/johan/Documents/roms/kernel.hex")
) kernel(
  .clk(mem_clk),
  .addr(addr_synced[12:0]),
  .rom_out(kernel_out)
    );
...
rom #(
 .ROM_FILE("/home/johan/Documents/roms/basic.hex")
) basic(
  .clk(mem_clk),
  .addr(addr_synced[12:0]),
  .rom_out(basic_out)
    );
...

Hooking VIC-II to memory

Next up, we should hookup our VIC-II to our memory.

We start this off by first connection the address port of the VIC-II instance with a set of wires:

...
wire [13:0] vic_addr;
...
vicii vic_inst(
  .addr(vic_addr),
  .data(),
  .reset(rst),
  .clk(clk_in),
  .clk_out(mem_clk),
  .c64_reset(c64_reset),
  .clk_out_1_mhz(c64_clk),  
  .first_pixel(),
  .frame_sync(),
  .blank_signal(),
  .out_rgb()
  );
...

As you see the width of the VIC-II address port is 14 bits. This port will be interfacing with the main RAM, which have uses an address bus with a width of 16 bits. To cater for this different length we slap two extra bits in front of the vic_addr:

...
wire [15:0] vic_addr_padded;
...
assign vic_addr_padded = {2'b0, vic_addr};
...


We all know that usually bits 15 and 14 of the VIC-II address is provided by peripheral register $DD00, but for now we are just going to leave them hardcoded to 0.

Next, we should Interface to main memory:

...
reg [13:0] vic_addr_delayed;
...
  always @ (posedge mem_clk)     
    vic_addr_delayed <= vic_addr; 
...
 always @ (posedge mem_clk)
    begin
     if (WE_delayed & !WE_color_ram) 
     begin
      ram[addr_ram_in] <= ram_in_delayed;
      ram_out <= ram_in_delayed;
     end
     else 
     begin
      ram_out <= ram[addr_ram_in];
     end 
    end 

 always @ (posedge mem_clk)
    begin
      ram_out_2 <= ram[vic_addr_padded]; 
    end 
...
always @*
  casex (vic_addr_delayed)
    default: vic_combined_d = ram_out_2;
  endcase
...


We have added an extra always block also returning content of the ram array. When performing synthesis in Vivado, the synthesis tool will spot the two always blocks utilising the same ram array and synthesise a dual port block ram instance accordingly.

I have also added a combination block via the casex statement, for the whole purpose of enabling the right block of memory. For now it looks a bit silly, since it only have a default selector, but we will be adding an extra selector soon.

As we did for our cpu interface in a previous post, we also delay our address by one clock cycle. This is again to compensate for the way block rams work.

Next Block we should map within our VIC-II address space is the Character ROM. We currently don't have a Block RAM element containing the Character ROM data, so let us start off by creating one:

rom #(
     .ROM_FILE("/home/johan/Documents/roms/chargen.hex"),
     .ADDR_WIDTH(12)
    ) chargen(
      .clk(mem_clk),
      .addr(vic_addr[12:0]),
      .rom_out(chargen_out)
        );


After this we modify our combinational block for our VIC-II as follows:

always @*
  casex (vic_addr_delayed)
    14'h1xxx : vic_combined_d = chargen_out;
    default: vic_combined_d = ram_out_2;
  endcase


we have added the selector 14'h1xxx because the Character ROM is mapped at addresses $1000-$2000 in VIC-II address space.

You will notice that our Character ROM element is only a single port. For now it is only necessary that our VIC-II instance have access to the Character ROM.

One final block we need to map in the address space of the VIC-II is the Color RAM. Currently we also don't have a block to store the contents of the Character ROM, so we will need to create one.

This time around, however, the Color RAM will need to be dual port, as opposed to the Character ROM. We need the CPU to populate the Color RAM with meaningful data so that a meaningful picture can be rendered.

We will therefore start off by first implementing the Color RAM with an interface to our CPU.

The implementation will be as follows:

...
reg [3:0] color_ram [1023:0];
wire WE_color_ram;
...
assign WE_color_ram = WE_delayed & (addr_synced[15:10] == 6'b110110) ? 1 : 0;
...
//Color RAM
 always @ (posedge mem_clk)
    begin
     if (WE_color_ram) 
     begin
      color_ram[addr_synced[9:0]] <= ram_in_delayed;
      color_ram_out <= ram_in_delayed;
     end
     else 
     begin
      color_ram_out <= color_ram[addr_synced[9:0]];
     end 
    end 

We have defined WE_color_ram so that we only write to color RAM if the address is within the color RAM range.

One final thing we should do for our Cpu->Color RAM interface to modify our combinational selector block:

always @*
  casex (addr_delayed)
    16'b101x_xxxx_xxxx_xxxx : combined_d_out = basic_out;
    16'b111x_xxxx_xxxx_xxxx : combined_d_out = kernel_out;
    16'hd012: combined_d_out = line_counter;
    16'b1101_10xx_xxxx_xxxx : combined_d_out = color_ram_out;
    default: combined_d_out = ram_out;
  endcase


This is all the modications needed to interface with the CPU.

Now, for interfacing the VIC-II to color RAM. We start by defining an additional port to color RAM:

...
reg [7:0] color_ram_vic_out;
...
//Color RAM
 always @ (posedge mem_clk)
    begin
     if (WE_color_ram) 
     begin
      color_ram[addr_ram_in[9:0]] <= ram_in_delayed;
      color_ram_out <= ram_in_delayed;
     end
     else 
     begin
      color_ram_out <= color_ram[addr_ram_in[9:0]];
     end 
    end 

 always @ (posedge mem_clk)
    begin
      color_ram_vic_out <= color_ram[vic_addr[9:0]]; 
    end 
...

The interesting thing is that we don't need an entry in the combinational logic for color RAM. Color RAM has its own dedicated set of wires going to Databus of the VIC-II.

This gives a reminder that we haven't connected the data port of the VIC-II instance yet! So let us do that quickly:

vicii vic_inst(
  .addr(vic_addr),
  .data({color_ram_vic_out[3:0],vic_combined_d}),
  .reset(rst),
  .clk(clk_in),
  .clk_out(mem_clk),
  .c64_reset(c64_reset),
  .clk_out_1_mhz(c64_clk),  
  .first_pixel(first_pixel),
  .frame_sync(frame_sync),
  .blank_signal(blank_signal),
  .out_rgb(out_rgb)
  );


So, we basically concatenate the output of our color ram together with the output of the combinational logic for our VIC-II.

Adding code for testing the VIC-II

Time to write some code to test our C64 system with the new addition of the VIC-II.

First, we need to some extra ports to our C64 core module for surfacing the VIC-II signals:

module c64_core(
  input wire clk_in,
  input wire rst,
  input wire debug_clk,
  input wire debug_mode,
  input wire [15:0] addr_in,
  output wire [7:0] data_out,
  output wire first_pixel,
  output wire frame_sync,
  output wire blank_signal,
  output wire [23:0] out_rgb

    );


Our initial block in our top block will look very similar to the one we used in our previous post rendering a VIC-II screen with dummy data:

initial begin
  f = $fopen("/home/johan/out.ppm","w");
  $fwrite(f, "P3\n404 284\n255\n");
  #1000 reset <= 0;
  #700000000
    while (first_pixel == 0) begin
    @(negedge clk);    
  end
  while (!frame_sync)
  begin
    if (!blank_signal)
      $fwrite(f, "%d %d %d\n", red, green, blue);
    @(negedge clk);
  end
  $fclose(f);
  
  $finish;
end    


The only difference is that we wait longer, giving the CPU change to initialise the system and writing the welcome message, before we take a snapshot of the VIC-II output.

Test Result

Running the test, you wait a couple of minutes after which a out.ppm is generated. Open it up and voilĂ :


The C64 Welcome screen!

It is so simple thing, but with each flavour of a C64 emulator I have played around with, if get to this screen, it is always an aha moment for me. It just give me the confidence that I am more or less on the right track.

In Summary

In this post we have added the VIC-II core developed in the previous post to our existing c64core design.

We also confirmed that running the simulation, our VIC-II did indeed render an image resembling the C64 welcome screen.

In the next post we will add our C64core/VIC-II block design to our Vivado block design for running on the ZYBO FPGA. We will also try and see if we could write the output of our VIC-II core to the SDRAM of the ZYBO.

Writing the data to SDRAM would enable us to retrieve the image data via the XSCT console, allowing us to verify if our VIC-II core function correctly within FPGA hardware.

Till next time!