C64 on an FPGA: Designing the VIC-II core

Foreword

In the previous post we managed to write to SDRAM on the Zybo board from the FPGA.

In this post we will start to develop the VIC-II core and verify the design with a Verilog simulation.

While developing the VIC-II core in this post I refereed quite often to Christian Bauer's write-up on the VIC-II which can be accessed via the following link: http://www.zimmers.net/cbmpics/cbm/c64/vic-ii.txt

Christian Bauer did an excellent job in summarising various people's observations on the VIC-II into a single document. If planning to write a C64 emulator or other kind of implementation this doc is a most definite read.

VIC-II Memory access and the FPGA

Within a C64 system, the 6510 CPU and VIC-II can access the main memory at 1MHz without having to wait for the other.

One would think that for this dual access to main memory requires a 2 MHz clock to facilitate the required bandwidth, but the amazing thing is that only a 1MHz clock is required!

On the C64 this memory access is achieved by only allowing the 6510 memory access when the clock is high and only allowing the VIC-II access when the clock is low.

Of course the exception to the above is during bad lines in which the 6510 cannot access memory at all because the the VIC-II needs extra memory cycles to get the character codes from screen memory.

At this point the question arises on how to implement this kind of memory access within an FPGA.

For starters, Block RAM can provide data on a rising or a falling edge, but not both.

Fortunately most Block RAM types on FPGA's can provide dual read/write ports. This means that you can have two components that can access the same piece of Block RAM simultaneously, which is exactly what we want.

The situation gets a bit more complicated during bad lines where the VIC-II needs extra clock cycles to access the character codes from screen memory. So in effect our block RAM should be clocking at 2MHz during bad lines.

However, to simplify our design we should always clock our Block RAM at 2MHz, for both the 6510 Block RAM port and the VIC-II block RAM port.

Having both ports clocking at 2Mhz have the implications that the 6510 should also clock at 2MHz. To get the 6510 to clock at 1 MHz we can just mask out every second pulse of the clock signal we send to the 6510. The resulting clock signal will look as follow:

The top signal is the 2MHz and the bottom one is the 1MHz achieved by masking out every second pulse.

Ok, I must admit our 1MHZ doesn't have a 50% duty cycle as on a genuine C64, but in the end we will achieve more or less the same result 😊

Starting with the VIC-II design

Time for us to start designing a VIC-II core.

So, where do we start? Christian Bauer gives us a nice starting point in his document:

As you can see, the "Raster counter X/Y" plays a central role. This is no surprise as the complete screen display and all bus accesses are synchronized by it.

We can start off by implementing the X/Y Raster counter:

module vicii(
  input reset,
  input clk,
  );

  reg [8:0] x_pos;
  reg [8:0] y_pos;

  always @(posedge clk)
  if (reset)
  begin
    x_pos <= 0;
    y_pos <= 0;
  end
  else if (x_pos < 503)
    x_pos <= x_pos + 1;
  else
  begin
    x_pos <= 0;
    y_pos <= (y_pos < 311) ? y_pos + 1 : 0; 
  end

endmodule

In this code we are assuming a PAL implementation of the VIC-II chip, which have 312 lines and 503 pixel periods per line.

It should be noted that the input clock is the pixel clock, clocking at more or less 8MHz. This is the clock our C64 system should synchronise to. So from this signal we need to generate both a 1MHz signal and a 2Mhz signal.

To aid us in the signal generation we need to isolate the least significant 3 bits of our X counter:

...
  wire [2:0] bit_cycle;
...
  assign bit_cycle = x_pos[2:0];
...

This in effect gives us a counter that counts from 0 to 7. Apart from generating a 2MHz signal, this counter will have other uses as well, which we will see later on.

We can get our 2MHz signal by using bit 1 of bit_cycle:

module vicii(
  input reset,
  input clk,
  output clk_out,
  );
...
  wire clk_2_mhz;
...
  assign clk_2_mhz = bit_cycle[1];
  assign clk_out = clk_2_mhz;
...

The 1MHz signal we generate as follows:

module vicii(
  input reset,
  input clk,
  output clk_out,
  output clk_out_1_mhz
  );
...
  assign clk_out_1_mhz = bit_cycle > 4 ? clk_out : 0;
...

To understand this assignment we need to remember when our counter counts from 0 to 7, our 2MHz signal will produce two pulses. For our 1MHz signal we need to suppress the second pulse.

Reading Memory

Time to start to think how our VIC-II core will interface with memory.

As we will first only implement text mode in our core, we only need to worry about two types of memory accesses: Reading Screen memory and reading Character ROM.

In the coming subsection we will implement Screen memory read accesses and in the section thereafter we will implement accesses to Character ROM read accesses.

Screen Memory Reads

As mentioned earlier we don't read the Screen Memory on every screen line, but rather on every bad line. A bad line corresponds to every 8^th line in the visible character region.

Because we are reading the screen memory only every 8^th line, we need to have some kind of internal buffering. Here is the code for implementing the internal buffering:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  );
...
  wire [5:0] line_cycle_num;
  wire visible_vertical;
  wire WE;
  reg [2:0] char_line_num; 
  reg [11:0] char_buffer [39:0];
  reg [11:0] char_buffer_out;
  reg [5:0] char_buf_pos;
...
  assign line_cycle_num = x_pos[8:3];
...
  assign visible_vertical = (y_pos > 55) & (y_pos < 255) ? 1 : 0;
...
  assign WE = (line_cycle_num >= 4) & (line_cycle_num <= 43) & visible_vertical & (bit_cycle == 2)
          & (char_line_num == 0);
...

  always @(posedge clk)
  if (WE)
  begin
    char_buffer[char_buf_pos] <= data;
    char_buffer_out <= data;
  end else
    char_buffer_out <= char_buffer[char_buf_pos];
...
  always @(posedge clk)
    if (!visible_vertical)
      char_line_num <= 0;
    else if (x_pos == 384)
      char_line_num <= char_line_num + 1;
...
   always @(posedge clk)
     if (!visible_vertical)
       char_buf_pos <= 0;
     else if (bit_cycle == 0 & visible_horiz)
     begin
       if (char_buf_pos < 39)
         char_buf_pos <= char_buf_pos + 1;
       else
         char_buf_pos <= 0;
     end
...

As you can see we have added two extra ports to our core, addr and data. These ports is linked to external block memory. addr is the requested address and data will be delivered to the data port.

We will get in a moment on we determine the address.

You will notice that data port is 12 bits wide as well as each element of char_buffer. The databus and internal buffer on a real VIC-II is also 12-bits as explained in Christian Bauer's document.

The reason for the 12-bit width is that when the VIC-II reads from screen memory, data is read simultaneously from color memory from the same character position. The 8-bit screen code and 4-bit color code is then send as one 12-bit word over the databus. This combination just avoids an extra memory trip.

I have introduced a couple of reg's/wire's to make life simpler. The one is line_cycle_num which is bits 3-8 of the x_pos. This gives the cycle number within a line which can be anything between 1-63. Working with the line cycle number just makes visualising the code a bit better.

Another reg of interest is char_line_num. This register counts from 0 to 7 within the visible character region and is used to determine when we are at a bad line.

To determine if we are at a line within the visible character region we make use of visible_vertical.

With all the above defined we can define write enable (the WE wire) that determine when we should write to our internal character buffer.

This basically concludes on how the internal buffer gets populated.

What still need to discuss is how the address gets calculated. Here is the code:

...
  reg [9:0] screen_mem_pos;
...
   always @(posedge clk)
     if (!visible_vertical)
       screen_mem_pos <= 0;
     else if (bit_cycle == 0 & visible_horiz & char_line_num == 0)
       screen_mem_pos <= screen_mem_pos + 1;
...
   always @*
     if (bit_cycle == 1)
       addr = {4'b1, screen_mem_pos};
...

The screen_mem_pos counts from 0 to 999. This register is 10 bits wide. If you prepend a bit value of 1 then you get an address starting at 400 Hex.

Character ROM Reads

In the previous subsection we covered the reading of character codes from screen memory.

As a matter of fact, a character code is an index to a character image in the Character ROM. Since each Character image consists out of eight bytes, the address of the char image for a given code is calculated by simply multiplying the code by eight. We incorporate this within our addr assignment as follows:

   always @*
     if (bit_cycle == 1)
       addr = {4'b1, screen_mem_pos};
     else
       addr = {3'b010,char_buffer_out[7:0],char_line_num};

The if statement assist us with sequencing so that for the first pulse we always issue a read for the character code, and for the second pulse we issue a read to Character ROM for given code.

The address for a read from character ROM is a concatenation of three values. char_line_num is three bits, with the effect that our character code is multiplied by eight (e.g. because char_buffer_out gets shifted left three times).

Appending char_line_num ensure that we get the correct line from the screen image, that is depended on the current screen line we are on.

We need to prepend the binary value 3'b010 to the resulting address because the Character ROM is mapped at address $1000-$2000 within the VIC-II address space. Christian Bauer's doc gives more detail on this.

This concludes how the VIC-II access data from Character ROM.

Rendering Pixels

We are now ready to consider how rendering of pixels will be implemented.

The basic principle is that we copy a line of the character image within a shift register. We then shift through all the bits in the shift register, outputting the text color if the bit is a one, or output the background color if the bit is a zero.

The basic code for the pixel rendering is as follows:

...
  wire [3:0] out_color;
  wire [3:0] out_pixel;
  reg [7:0] pixel_shift_reg;
  reg [3:0] color_buffered_val;
...
  assign out_color = pixel_shift_reg[7] == 1 ? color_buffered_val : 4'd6;
  assign out_pixel = visible_vertical & visible_horiz ? out_color : 4'd14;
...
  always @(posedge clk)
  if (bit_cycle == 7)
    color_buffered_val <= char_buffer_out[11:8];
...
  always @(posedge clk)
  if (bit_cycle == 7)
    pixel_shift_reg <= data[7:0];
  else
    pixel_shift_reg <= {pixel_shift_reg[6:0],1'b0};
...

As you might have known, the data for displaying a character is fetched at the previous character position.

When we are about to display the new character line, it is important that we save the character pixels and the text color, since both values will be overridden halfway through the drawing of the character line.

The text color gets stored in color_buffered_val and the pixel data gets stored in pixel_shift_reg, our shift register.

During the rendering of a character line, we keep shifting pixel_shift_reg left by one bit at each clock cycle.

The bit we are always keeping an eye on is the most significant bit of pixel_shift_reg. If the most signficant bit is a one, we output the text color, otherwise we output the background color.

Because I haven't implemented registers yet within the VIC-II core I am just outputting a hardcoded value for the background color which is 6 (blue), which is the standard color at boot up.

out_color only returns colors within the character region of the screen. I have added the out_pixel wire to return the border color if we are writing pixels within the border region.

We are just about finished with implementing the rendering of pixels. At this point, however, we are returning pallette index values as colors with values 0 - 15.

We need to map these indexes to RGB values, so it is useful for the rest of the system. This is straightforward as follows:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  output reg [23:0] out_rgb
  );
...
   always @*
     case (out_pixel)
       4'd0: out_rgb = 24'h000000;
       4'd1: out_rgb = 24'hFFFFFF;
       4'd2: out_rgb = 24'h880000;
       4'd3: out_rgb = 24'hAAFFEE;
       4'd4: out_rgb = 24'hCC44CC;
       4'd5: out_rgb = 24'h00CC55;
       4'd6: out_rgb = 24'h0000AA;
       4'd7: out_rgb = 24'hEEEE77;
       4'd8: out_rgb = 24'hDD8855;
       4'd9: out_rgb = 24'h664400;
       4'd10: out_rgb = 24'hFF7777;
       4'd11: out_rgb = 24'h333333;
       4'd12: out_rgb = 24'h777777;
       4'd13: out_rgb = 24'hAAFF66;
       4'd14: out_rgb = 24'h0088FF;
       4'd15: out_rgb = 24'hBBBBBB;
...

Synchronisation

In the currently state of our VIC-II core we are outputting the colors for pixels, but we cannot tell which parts on the screen these pixels are mapping to.

We need some kind of synchronisation. For synchronisation we need to add the following output signals to our module:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  output reg [23:0] out_rgb
  output wire first_pixel,
  output wire frame_sync,
  output wire blank_signal
  );

The fram_sync signal get set when the y-counter is at the Vertical blank lines towards the end of the frame. If you are populating a buffer with the pixel values, this signal will give you a chance to reset the buffer pointer to position 0 before the pixels of a new frame arrives.

The first_pixel gives the indication that pixels for the new frame has just started, and you can start populating your buffer from position zero.

The blank_signal indicates that we are on a horizontal blanking period on the line and pixel values should be ignored.

The assignment for these signals is as follows:

  assign first_pixel = (x_pos == 0) & (y_pos == 16) ? 1 : 0;  
  assign frame_sync =  y_pos > 299; 
  assign blank_signal = y_pos < 16 | y_pos > 299 | x_pos > 403 ? 1 : 0;

You will notice that only indicate the first pixel at y_pos 16. This is because the first 16 lines are also vertical blank lines.

Creating the Test Harness

Time has come to create a test harness to test our VIC-II core to see if it works as expected.

Since this Test Harness should only test the functionality of the VIC-II core, we only need to include interface to RAM and the Character ROM that the VIC-II requires. At this point we don't need to worry about implementing dual port RAMS as discussed earlier in this post.

We start by creating an instance of vicii and wiring up some of the ports:

wire [13:0] vic_addr;

vic_ii vic_inst(
  .addr(vic_addr),
  .reset(reset),
  .clk(clk),
  .clk_out(clk_out),
  );

For now I have implemented only a handful of the ports. I will connect the remaining ports during the course of this section.

Just a quick refresher on the purpose of these ports.

addr is an output port stating an address of which it needs information from.

clk is the clock input port. The frequency of this input clock is 8Mhz.

clk_out provides a scaled down clock with frequency 2Mhz for memory accesses.

Now, let us implement the different memories that the VIC requires. These are main RAM, Character ROM and Color RAM.

Here is the implementation of them:

...
wire [9:0] trunc_screen_addr;
wire [11:0] trunc_char_addr;
...
reg [7:0] char_rom [4095:0];
reg [7:0] screen_ram [1023:0];
reg [3:0] color_ram [1023:0];
reg [7:0] char_rom_out;
reg [7:0] screen_ram_out;
reg [3:0] color_ram_out;
...
assign trunc_screen_addr = vic_addr[9:0];
assign trunc_char_addr = vic_addr[11:0];
...
 always @ (posedge clk_out)
    begin
      char_rom_out <= char_rom[trunc_char_addr];
    end 
...
 always @ (posedge clk_out)
    begin
      color_ram_out <= color_ram[trunc_screen_addr];
    end 
...
 always @ (posedge clk_out)
    begin
      screen_ram_out <= screen_ram[trunc_screen_addr];
    end 
...

At this point we know that we should populate the contents of the Character ROM. However, since we are not wiring up any CPU to our Test Harness, we should also pre-populate the color RAM and screen RAM with Test data. Both of them should contain 1024 elements.

The population will again be done in a similar fashion as in previous posts:

initial begin
      $readmemh("/home/johan/Documents/roms/chargen.hex", char_rom) ;
      $readmemh("/home/johan/Documents/roms/colorram.hex", color_ram) ;
      $readmemh("/home/johan/Documents/roms/screenram.hex", screen_ram) ;
    end

Next, we should wire up these ROM and RAMS to the VIC as in the memory map for the VIC-II:

...
reg [13:0] vic_addr_delayed;
reg [7:0] combined_vic_data;
...
 always @(posedge clk_out)
   vic_addr_delayed <= vic_addr;
...
vic_ii vic_inst(
  .addr(vic_addr),
  .data({color_ram_out,combined_vic_data}),
  .reset(reset),
  .clk(clk),
  .clk_out(clk_out),
  );
...
  always @*
    casex (vic_addr_delayed) 
      14'b00_01xx_xxxx_xxxx: combined_vic_data = screen_ram_out;
      14'b01_xxxx_xxxx_xxxx: combined_vic_data = char_rom_out;
      default: combined_vic_data = 0;
    endcase
...

As also outlined in the VIC-II model described by Christiaan Bauer, we don't include Color RAM as an entry within the casex statement, but rather prepend it to combined_vic_data.

We are now ready for coding the heart of our Test Harness:

initial begin
  f = $fopen("/home/johan/out.ppm","w");
  $fwrite(f, "P3\n404 284\n255\n");
  #50 reset <=0;
  #90000;
  while (first_pixel == 0) begin
    @(negedge clk);    
  end
  while (!frame_sync)
  begin
    if (!blank_signal)
      $fwrite(f, "%d %d %d\n", red, green, blue);
    @(negedge clk);
  end
  $fclose(f);
  #3000000 $finish;
end

The basic idea is that we create a image from the pixel output. We start this off by opening up a new image file for writing. The format of this file is a portable pixmap format (ppm). With this file format the pixel values gets written as plain text.

The first thing we write to the file is the header.

P3 means each pixel will have three values (e.g. rgb).
255 means that each value can have a max value of 255.
404 284 means that the image will have a pixel width of 404 and a pixel height of 284

As part of our test of the VIC-II core we want to test that the synchronise functionality works correctly. So, we need to start the test at a random pixel within a frame. This is the purpose of the #90000 we have added.

When then wait till the first _pixel signal is asserted. Interesting use here is the use of the at(@). We usually only use this kind of statement within an always block. We could, however, use this statement within a initial block as well.

We do the check on first _pixel each time on the negative edge of the clock cycle.

When we have finally got an asserted first_pixel, we start writing the pixels to our ppm file. We also also advance to the next value on the negative edge.

Within the pixel writing loop we check for two thing. The first thing is not to write a pixel value if blank_signal is asserted. The second thing to check for is to only loop till the frame_sync signal is asserted.

With the frame_sync signal asserted we can assume the frame is finished, and we can stop our test harness.

Test Result

For my test screen data I have just repeated the screen codes 0 to 4 and ended off the test data with a couple of screen code 0's.

The result for running the Test Harness with these test data is ass follows:

Our core is behaving more or less as expected.

In Summary

In this post we have developed the VIC-II core and tested it with a Test Harness.

In the next post we will integrate this VIC-II core with the rest of our c64 core.

Till next time!

6 comments:

TO9XCT14 March 2019 at 03:20
There are a lot of talk, about why there is no VIC-II replacement chip, and that once every VIC-II have died, then there are no more real C64 running. Are there someone, somewere, that are making an FPGA version of the actual VIC-II chip as a replacement option for real C64 boards?
TO9XCT25 April 2019 at 13:56
Thanks for the info. I think all the stocks will dry up within the next couple of years. And as I have no programming skills, then I can not make any replica. It is just that around on forums, people are beginning to talk about an replica. So the time might be right for someone to start such a project.
Piccolo18 June 2019 at 20:20
Try look here for VHDL designs about C64 logic
Put the URL together below and browse:
http s :// www.syntiac.com / fpga64.html
WhoDares17 May 2020 at 00:32
I must admit, when I bought my TurboChameleon a good while ago, I had thought about making a customised VIC-II chip. Then I saw the MEGA65 producing a "VIC-IV" chip in their new computer, so I got a Nexys4DDR to play with that.

Several years down the line, and I still haven't invested a lot of time into FPGA, but it's still kind of a pipe-dream (and I guess, brought me to your blog)

Monday, 22 January 2018

Designing the VIC-II core