Friday, 29 November 2019

Implementing Sound: Part 2

Foreword

In the previous post we gave some thought on the idea of adding SID sound to our C64 module.

This ended off not to be such a daunting task, since we found an existing SID implementation on Github, written in SystemVerilog by Thomas Kindler.

We tested this SID implementation by capturing a couple of seconds worth of SID register writes from a JavaScript emulator I wrote a couple of years ago. These SID register writes I then supplied to Kindler's SID core and listened to the output.

The result was very pleasing. Initially I spotted a bit of clipping, but subsequently fixed this by reducing the volume of each voice.

In this post we will be adding Kindler's core to our C64 module and see if we can play SID sound in realtime.

The importance of clock locking

I would like to start off this post by talking about an issue of a different kind I had to solve with the C64 module.

As I kept adding more functionality to the C64 module, I ended up once again with a case where this core didn't want to boot up anymore.

After checking the Verilog code of the C64 module time and again, I couldn't find anything wrong.

At one point I started wondering: In the clock wizard generating the 16MHz signal, I am not using the lock signal at all. Can this be a source of issues?

The following post on Xilinx Community forums shines a bit of light on this issue: https://www.xilinx.com/support/answers/52806.html. As quoted from this post:

Until the LOCKED signal is asserted High, the DCM/DLL output clocks are not valid and can exhibit glitches, spikes, or other spurious movement.
So, it is a very good idea to honour the Locked signal.

Of course we need to use this signal when generating the reset signal for the 6502:

...
assign c64_reset = (reset_counter > 8000000) & (reset_counter < 8000020) ? 1 : 0;
...
    always @(posedge clk_in)
     if ((reset_counter < 9000000) & locked)
       reset_counter <= reset_counter + 1;
...


You might remember that we generate this signal within the VIC-II module which is clocked at 8MHz. So, in this code we wait about a second after the 8MHz clock generator is locked, after which we assert the signal for a couple of cycles.

Mapping SID into memory

For now, we will only worry about performing writes to the SID, and not any reading. This will result in the following port assignments of the SID module:

MOS6581 sid(
...
    .addr(addr[4:0]),
    .data(ram_in),       
    .n_cs(!(we & io_enabled & (addr[15:8] == 8'hd4))),
    .rw(0),
    .clk(clk_1_mhz), .clk_en(1), .n_reset(!c64_reset)
);

Since the SID only have 29 registers, we only connect the lower 5 bits of the address bits to the SID module.

We permanent wire this module to write mode (e.g. rw is set to zero).

Also, we enable writing when there is a write within the IO region (e.g. address D000 to DFFFF) and the first eight bits of the address equal to 0xD4.

Outputting samples to the Sound System

Some time ago we played with sound on the Zybo board. The Zybo board can create high quality sound with the help of the Analog Devices SSM2603 Audio Codec.

This codec receives samples in a serial fashion with the I2S protocol.

We implemented a block that generates a monotone and converted the samples to the I2S protocol so the audio codec can receive it.

In this section we extend this block so that can receive audio samples from the SID block.

Let us start by having a look at the port definitions of the I2S block:

module i2s(
  input clk,
  input clk_1_mhz,
  input [15:0] audio_in,
  output clk_1_5_mhz,
  output channel_enable,
  output out_data,
  output mute_en
    );


The output ports are basically the I2S ports that we will connect to the Audio codec.

The audio_in port is the audio samples from the SID module.

We have again a cross clock domain that we need to solve. The SID generate samples at a rate of 1MHz, whereas the audio codec need to receive the samples at a rate of 48KHz.

So, apart from solving the clock domain issue, we also need to discard a number of samples from the SID module to get to the 48KHz sample rate.

Let us start by having a look at the critical point at which we need to inject a sample from the 1MHz clock domain:

    always @(posedge clk)
    if (channel_enable_counter == 15 & neg_edge)
    begin
      shift_reg <= {data_val, data_val};      
    end
    else if (neg_edge)
      shift_reg <= {shift_reg[30:0] , 1'b0};


This is the logic for the shift register that shift out the data to the audio codec. Basically we want a 1MHz sample at the right time within data_val by the time the shift register gets reloaded.

We have written the above snippet quite some ago, so let us familiarise ourselves again what is going on in this snippet.

Within the world of our Audiocodec, there is three clock frequencies:
  • The master clock: 12.288MHz
  • The serial data clock: 1.536MHz
  • The sample clock: 48KHz
To avoid multiple cross clock domain issues, we try to clock all our always blocks (Except the 1MHz bits) at 12.288MHz.

We want our shift register to clock at 1.536MHz, so we introduce a signal neg_edge that gets asserted when we are the negative edge of the 1.536 signal:

    reg [1:0] clk_div_counter = 0;
    
    assign neg_edge = (clk_div_counter == 3) & (bclk_int == 1) ? 1 : 0;

    always @(posedge clk)
      clk_div_counter <= clk_div_counter + 1; 

    always @(posedge clk)
    if (clk_div_counter == 3)
      bclk_int <= ~bclk_int;


So, bclk_int is our 1.536MHz clock, which is generated by toggling it every four clock cycles.

Although bclk_int is a clock signal, we don't use it to clock any @always blocks, so no need to worry about any cross clock domain issues here.

Let us similarly bring the 48KHz sample signal into the picture:

    always @(posedge clk)
    if (neg_edge & channel_enable_counter == 15)
      prclk_int <= ~prclk_int;


This looks very similar to the signal that we use to load data into the shift register. The correct instant to load a value from the SID into data_val is a cycle or two after the reload of the shift_regsiter has occurred.

The following snippet accomplish just that for us:

    (* ASYNC_REG = "TRUE" *) reg sig_48_khz_0, sig_48_khz_1, sig_48_khz_2;

    always @(posedge clk_1_mhz)
    begin
       sig_48_khz_0 <= prclk_int;
       sig_48_khz_1 <= sig_48_khz_0;
       sig_48_khz_2 <= sig_48_khz_1; 
    end

    always @(posedge clk_1_mhz)
    if (!sig_48_khz_1 & sig_48_khz_2)
      data_val <= audio_in;


Here we have a multi-flop synchroniser again to bring the 48KHz signal to the 1MHz domain.

This multi-flop syncroniser has an additional function: Delaying the assignment of a new value until the current value has been handed over to the shift register.

So, with the above setup we are in effectively removing the overlap of fetching a sample from the 1MHz domain and reading it in the 48KHz domain.

Implementing these changes will give us a fully functional SID implementation within our C64 module.

In Summary

In this post we finished off our SID implementation within our C64 FPGA implementation.

Special thanks to Thomas Kindler for sharing the source code on Github for his SID implementation.

Previously I aimed that this post would have been the last one.

However, an extra idea popped up. The C64 FPGA implementation in its current state only fills a small area on the screen, so in the next post we will see if we can scale this image up so it can fill most of the screen.

I want to end off this post with an interesting thought. When doing FPGA programming, every now and again one is faced with Cross Clock domain issues. That makes one realise that although we are working with digital electronics where we always have a discrete state of a zero and a one, the circuit in a chip still exhibit similar behaviour to that of an analogue circuit.

Till next time!

Wednesday, 30 October 2019

Implementing Sound: Part 1

Foreword

In the previous post we finished off implementing sprites into our C64 emulator.

This enabled us to fully play the game Dan Dare within our C64 emulator.

In this post we will start to implement a nice-to-have: Sound emulation.

To implement SID emulation from scratch can be quite a daunting task. So, I did some searching on the Internet to see if I could find an existing SID-implementation, written in Verilog.

As part of this post, I will also show how to create a test bench for evaluating such an implementation.

The chosen SID implementation

After some searching on the Internet, I am game across a nice SID implementation on Github coded in SystemVerilog, by Thomas Kindler. Here is the link to the project:

https://github.com/thomask77/verilog-sid-mos6581

I don't have hand-on experience with SystemVerilog, but according to many resources, Vivado does in fact support SystemVerilog. So my SystemVerilog disadvantage shouldn't be much of a setback :-)

Thete was one thing I did experience when using this code with Vivado. For output ports feeding off sequential elements, you need to declare with the 'reg' keyword.

Let us do an overview on how Thomas Kindler's SID module works. Thomas Kindler based much of the inner workings on the Interview with Bob Yannes, the designer of the original SID chip. A copy of this interview can be found on a couple of places on the Internet, including here.

At the heart of each voice on the SID is a 24-bit phase accumulator, clocked at 1MHz.

The phase accumulator is the work horse for generating one complete cycle for the desired waveform at the desired frequency.

Each Voice on the SID have a phase accumulator and gets incremented at each clock cycle by  the value stored in its 16 bit frequency register. That is registers 54272+54273 for Voice 1, 54279+54280 for Voice 2, and 54286+54287 for Voice 3.

If a Voice circuit have a frequency value of 1, we will therefore be producing waveforms with a period of 16 seconds, which is a frequency less than 1Hz. On the other hand, a frequency value of 65535 will yield a waveform with a frequency of about 4KHz.

Let us have a look at how the different waveforms gets created.

The triangle waveform is generated as follows:

...
   out_triangle = acc[22:12] << 1;
...
    if (acc[23])
        out_triangle ^= '1;
...

We are using the lower 23 bits of the phase accumulator. Our waveform starts off increasing till bit 23 of the phase accumulator gets set, after which we do a XOR on the resulting values, giving us a mirrored image of the previous time period.

Generating sawtooth is much simpler:

    out_saw      = acc[23:12];

And also pulse:

   out_pulse    = acc[23:12] < pw ? '1 : '0;

Noise gets generated with a Fibonacci sequence, which will not be going into detail here.

I also not be going into the specifics of Envelope generation. I will only mention here that Envelope generation applies ADSR (Attack, Decay, Sustain, Release) to the resulting waveform, and is dealt with in the file sid_env.sv.

Creating the Testbed

The simple part of creating a Testbed for the SID module is wiring up all the ports, supplying a clock signal, and applying some reset logic.

The complex part comes to get hold of a sequence of SID register writes that will generate a sound that we can evaluate by ear.

The most obvious way to get hold of such sequences would be to intercept writes to SID registers when executing the applicable program within a C64 emulator.

Some time ago a write an emulator in JavaScript, here, which I am going to use for this purpose. The full source of this emulator can be found here. We should now briefly put our JavaScript thinking caps on :-)

I will try though to keep the discussion on this JavaScript emulator short, trying to convey just the basic idea. Should some of you would like to have a more detailed blog post on this, please drop me a comment.

The place where we will be doing the interception of SID writes, will be within the file memory.js, within the following method:

  function IOWrite(address, value) {
    if ((address >= 0xdc00) & (address <= 0xdcff)) {
      return ciaWrite(address, value);
    } else if ((address >= 0xd000) & (address <= 0xd02e)) {
      return myVideo.writeReg(address - 0xd000, value);
    } else if ((address >= 0xd800) & (address <= 0xdbe8)) {
      return myVideo.writeColorRAM (address - 0xd800, value);
    } else {
      IOUnclaimed[address - 0xd000] = value;
      if ((address & 0xff00) == 54272)
      mysid.log(address & 31, value);
      return;
    } 
  }

mysid is the instance of a class that we still need to define, so let us start with the outline of the class:

function sid() {
var mycpu;
var lasttime = 0;

  this.setCpu = function(cpu) {
    mycpu = cpu;
  }

  this.log = function(addr, val) {
    diff = mycpu.getCycleCount() - lasttime;
    lasttime = mycpu.getCycleCount();

  }

}

One important thing when recording the writes, is to also record the exact time instance when the write happened. For this purpose we need a handle to the CPU instance to get the current Cycle Count.

With these pieces of information at hand, we can create a series of Verilog statement for the register writes that we can place in an initial-begin..end block. We will write these statements to a text area defined within our HTML page. With these changes are log function looks as follows:

...
  this.log = function(addr, val) {
    diff = mycpu.getCycleCount() - lasttime;
    lasttime = mycpu.getCycleCount();
        var ins = document.getElementById("diss");
        var temp = ins.value+ "\n#"+diff*10+";\n@(negedge clk)\n"+
          "rw = 0; n_cs = 0; addr = 5'd"+addr+
          "; data = "+val+";\n@(negedge clk)\n"+"rw=1; n_cs=1;";
        ins.value = temp;
  }
...


A typical sequence of this generated Verilog code looks as follows:

#580;
@(negedge clk)
rw = 0; n_cs = 0; addr = 5'd11; data = 32;
@(negedge clk)
rw=1; n_cs=1;
#712860;
@(negedge clk)
rw = 0; n_cs = 0; addr = 5'd0; data = 162;
@(negedge clk)
rw=1; n_cs=1;
#80;
@(negedge clk)
rw = 0; n_cs = 0; addr = 5'd1; data = 37;
@(negedge clk)
rw=1; n_cs=1;
#200;
@(negedge clk)
rw = 0; n_cs = 0; addr = 5'd4; data = 32;
@(negedge clk)
rw=1; n_cs=1;


So, we delay each set of assignments by a certain period of time as captured by the log function.

Interesting, JavaScript generating Verilog code!

Next, let us write some code for capturing the produced sound samples to a file so that we can listen to the produced sound later on:

...
integer f = 0;
integer i = 0;
...
initial begin
  f = $fopen("sound.raw","wb");
  #100;
  for (i = 0; i < 90000000; i = i + 1) begin
    @(negedge clk)
    if ((i% 20) == 1)
    begin     
      $fwrite(f,"%c",audio[7:0]);
      $fwrite(f,"%c",audio[15:8]);
    end
  end
$fclose(f);

end


The sound gets produced at a rate of 1MHz. I am reducing the sample rate to 48KHz by catching only every 20th sample. In this way most sound player would be able to keep up.

Audacity is a Opensource program that allows you to import and play these raw samples.

Test Results

Let us listen to the resulting sounds.

Our first attempt is kind of successful, but there is a bit of distortion:

The distortion is more visible within the wave editor of Audacity:


One can clearly see the waveform goes off the screen in a couple of places.

Looking at the source file sid_filter.sv of the SID implementation, one kind of get a feeling of where things go wrong:

out_next = (out_next * reg_vol) >> 2;

Here we multiply the final sample with the master volume and divide the result by 4. When inspecting the waveform during a Vivado simulation, multiplying by a master volume of 15 sometimes yield a number that is way past the range of a 16-bit number, and dividing by 4 simply isn't enough. I fix this by dividing by 8 instead of four:

out_next = (out_next * reg_vol) >> 3;

The result is much better, although not taking advantage of the full volume range:


In Summary

In this post we started implementing sound within our emulator.

We evaluated Thomas Kindler's SID implementation and found it work very well.

Many thanks for Thomas Kindler for making the source of this implementation available on Github.

In the next post we will continue to integrate this SID core to our C64 core.

Till next time!



Monday, 14 October 2019

Implementing Sprites: Part 3

Foreword

In the previous post we have implemented the capability for our sprite to expand in both the X and Y directions. We also have implemented Sprite multicolor mode.

Up to now are our VIC-II only supported a single sprite, Sprite 0. So, in this post we will be connecting the remaining seven sprites.

With our VIC-II module able to display all eight sprites, we would be able to fully play the game Dan Dare with our emulator.

This would indeed be a very nostalgic moment for me, but raised a bit of a concern for me. If one is going to play extended periods on the Zybo with our emulator, wouldn't the Zynq SoC eventually get very hot?

My concern was driven by the fact that these days you find quite a number of videos on the Internet concerning cooling solutions for single board computers. With this in mind, when you come to the Zybo board, you cannot really find any information regarding what kind of temperatures to expect during general use of the board.

So, I will end off this post by sharing what I have found by experimentation regarding the temperature of the Zynq when run our emulator for half an hour or so.

Hooking up the remaining sprites

Currently we only have a single instance of sprite_generator for sprite 0. Let u start by adding instances for the remaining sprites. For simplicity, I am only showing the declarations for the first three:

sprite_generator sprite_0(
  .clk_in(clk_in),
  .raster_y_pos(y_pos - 5),
  .raster_x_pos(x_pos - 16),
  .sprite_x_pos({sprite_msb_x[0],sprite_0_xpos}),
  .sprite_y_pos(sprite_0_ypos),
  .store_byte(store_sprite_pixel_byte && sprite_data_region_offset[6:4] == 0),

  .x_expand(x_expand[0]),
  .y_expand(y_expand[0]),
  .multi_color_mode(multi_color_mode[0]),
  .sprite_multi_0(sprite_multi_color_0),
  .sprite_multi_1(sprite_multi_color_1),
  .primary_color(sprite_primary_color_0),

  .data(data_in[7:0]),
  .sprite_enabled(sprite_enabled[0]),
  .show_pixel(show_pixel_sprite_0),
  .output_pixel(out_pixel_sprite_0),
  .request_data(),
  .request_line_offset(sprite_0_offset)
    );

sprite_generator sprite_1(
  .clk_in(clk_in),
  .raster_y_pos(y_pos - 5),
  .raster_x_pos(x_pos - 16),
  .sprite_x_pos({sprite_msb_x[1],sprite_1_xpos}),
  .sprite_y_pos(sprite_1_ypos),
  .store_byte(store_sprite_pixel_byte && sprite_data_region_offset[6:4] == 1),

  .x_expand(x_expand[1]),
  .y_expand(y_expand[1]),
  .multi_color_mode(multi_color_mode[1]),
  .sprite_multi_0(sprite_multi_color_0),
  .sprite_multi_1(sprite_multi_color_1),
  .primary_color(sprite_primary_color_1),

  .data(data_in[7:0]),
  .sprite_enabled(sprite_enabled[1]),
  .show_pixel(show_pixel_sprite_1),
  .output_pixel(out_pixel_sprite_1),
  .request_data(),
  .request_line_offset(sprite_1_offset)
    );

sprite_generator sprite_2(
  .clk_in(clk_in),
  .raster_y_pos(y_pos - 5),
  .raster_x_pos(x_pos - 16),
  .sprite_x_pos({sprite_msb_x[2],sprite_2_xpos}),
  .sprite_y_pos(sprite_2_ypos),
  .store_byte(store_sprite_pixel_byte && sprite_data_region_offset[6:4] == 2),

  .x_expand(x_expand[2]),
  .y_expand(y_expand[2]),
  .multi_color_mode(multi_color_mode[2]),
  .sprite_multi_0(sprite_multi_color_0),
  .sprite_multi_1(sprite_multi_color_1),
  .primary_color(sprite_primary_color_2),

  .data(data_in[7:0]),
  .sprite_enabled(sprite_enabled[2]),
  .show_pixel(show_pixel_sprite_2),
  .output_pixel(out_pixel_sprite_2),
  .request_data(),
  .request_line_offset(sprite_2_offset)
    );


This is a typical copy and paste exercise. However, some of the ports is specific to the sprite itself. Here is a list of these ports:

  • sprite_x_pos/ sprite_y_pos
  • store_byte
  • x_expand/y_expand
  • multi_color_mode
  • primary_color
  • sprite_enabled
  • show_pixel/out_pixel
  • request_line_offset
Obviously some ports will get its value from a particular bit position in a register, whereas the other ports in this list have there own dedicated registers.


Let us have a look at the output ports. The first port is sprite_x_offset. We use these ports as follows:

   always @*
     case (sprite_data_region_offset[6:4])
       3'd0: sprite_offset = sprite_0_offset;
       3'd1: sprite_offset = sprite_1_offset;
       3'd2: sprite_offset = sprite_2_offset;
       3'd3: sprite_offset = sprite_3_offset;
       3'd4: sprite_offset = sprite_4_offset;
       3'd5: sprite_offset = sprite_5_offset;
       3'd6: sprite_offset = sprite_6_offset;
       3'd7: sprite_offset = sprite_7_offset;
    endcase

     always @*
       if (!sprite_data_region && (clk_counter == 6 | clk_counter == 7))
         addr = bit_data_pointer;       
       else if (sprite_data_region && (sprite_data_region_offset[3:0] < 3))
         addr = {mem_pointers[7:4], 7'h7f, sprite_data_region_offset[6:4]};
       else if (sprite_data_region)
         addr = {sprite_data_location, (sprite_offset + sprite_byte_num)}; 
       else
         addr =  {mem_pointers[7:4], screen_mem_pos};

So, we use the applicable sprite_offset when it is the data cycle for a particular sprite.

We sit with a couple of show_pixel/output_pixel pairs for each sprite. We combine these as follows:

always @*
   if (show_pixel_sprite_0)
     color_for_bit_with_sprite = out_pixel_sprite_0;
   else if (show_pixel_sprite_1)
     color_for_bit_with_sprite = out_pixel_sprite_1;
   else if (show_pixel_sprite_2)
     color_for_bit_with_sprite = out_pixel_sprite_2;
   else if (show_pixel_sprite_3)
     color_for_bit_with_sprite = out_pixel_sprite_3;
   else if (show_pixel_sprite_4)
     color_for_bit_with_sprite = out_pixel_sprite_4;
   else if (show_pixel_sprite_5)
     color_for_bit_with_sprite = out_pixel_sprite_5;
   else if (show_pixel_sprite_6)
     color_for_bit_with_sprite = out_pixel_sprite_6;
   else if (show_pixel_sprite_7)
     color_for_bit_with_sprite = out_pixel_sprite_7;
   else
     color_for_bit_with_sprite = color_for_bit;

   assign color_for_bit = multicolor_data ? multi_color :    
            (pixel_shift_reg[7] == 1 ? char_buffer_out_delayed[11:8] : background_color);
   assign final_color = (visible_vert & visible_horiz & screen_enabled) ? color_for_bit_with_sprite : border_color;


For now we only assume that all sprites are in front of the main graphics, implementing the hardcoded priority, where the lower the sprite number, the higher the priority.

When we run our implementation on the Zybo board with these changes, it looks very promising: our characters have finally appeared!

One small thing doesn't look right though. Our characters are always in front of everything! They appear in front of rocks. Also, when we walk underwater, using a reed as a snorkel, only the reed should be visible. This is not the case with our emulator in its current state:


We see Dan Dare, his pet, and the Snorkel!

OK, i agree, this shouldn't come as a surprise, since we implemented sprites to be always visible in front of the background graphics.

Fine tuning Sprite display priority

There is a couple of Sprite priority functionality that should be implemented before our game screen can render correctly.

The first priority is priority according the Sprite priority register at address D01B. Firstly we need to implement this register into our VIC-II so it be be written to or read by the 6502. This is similar to the other registers we have implemented.

We use this register as follows:

always @*
   if (show_pixel_sprite_0 && !sprite_priority[0])
     color_for_bit_with_sprite = out_pixel_sprite_0;
   else if (show_pixel_sprite_1 && !sprite_priority[1])
     color_for_bit_with_sprite = out_pixel_sprite_1;
   else if (show_pixel_sprite_2 && !sprite_priority[2])
     color_for_bit_with_sprite = out_pixel_sprite_2;
   else if (show_pixel_sprite_3 && !sprite_priority[3])
     color_for_bit_with_sprite = out_pixel_sprite_3;
   else if (show_pixel_sprite_4 && !sprite_priority[4])
     color_for_bit_with_sprite = out_pixel_sprite_4;
   else if (show_pixel_sprite_5 && !sprite_priority[5])
     color_for_bit_with_sprite = out_pixel_sprite_5;
   else if (show_pixel_sprite_6 && !sprite_priority[6])
     color_for_bit_with_sprite = out_pixel_sprite_6;
   else if (show_pixel_sprite_7 && !sprite_priority[7])
     color_for_bit_with_sprite = out_pixel_sprite_7;
   else if (pixel_shift_reg[7])
     color_for_bit_with_sprite = color_for_bit;
   else if (show_pixel_sprite_0)
     color_for_bit_with_sprite = out_pixel_sprite_0;
   else if (show_pixel_sprite_1)
     color_for_bit_with_sprite = out_pixel_sprite_1;
   else if (show_pixel_sprite_2)
     color_for_bit_with_sprite = out_pixel_sprite_2;
   else if (show_pixel_sprite_3)
     color_for_bit_with_sprite = out_pixel_sprite_3;
   else if (show_pixel_sprite_4)
     color_for_bit_with_sprite = out_pixel_sprite_4;
   else if (show_pixel_sprite_5)
     color_for_bit_with_sprite = out_pixel_sprite_5;
   else if (show_pixel_sprite_6)
     color_for_bit_with_sprite = out_pixel_sprite_6;
   else if (show_pixel_sprite_7)
     color_for_bit_with_sprite = out_pixel_sprite_7;
   else
     color_for_bit_with_sprite = color_for_bit;

Within this snippet of code, you spot another implied priority by means of the check for pixel_shift_reg[7].

So, if this background pixel has a bit value of zero, it is actually transparent, allowing the sprites with background priorities to show through.

Obviously, if there is neither a visible sprite pixel with back or front priority, we will show the applicable background color.

There is a very interesting scenario when our main graphics is in multicolor mode. In Multicolor mode the high order bit indicates whether it is a background pixel or not. This means that we can have two possible background colors in multicolor mode, pixel value 00 and 01.

Having two background colors enables us to have a sprite that is sometimes hidden behind some objects and in front of others.

The following video shows how the game screen now looks with the recent round of changes:


This time around our emulator renders the scene more realistic. We go behind the rocks and is not visible when we go underwater.

This is actually the great nostalgic moment, what all this whole series of Blog posts were about!

Temperatures on the Zynq SoC

I mentioned in the beginning of this post that I have a bit of a concern on the temperature of the Zynq SoC when you are using it for extended periods of time.

One of my key areas for this concern is our USB stack program that runs on one of the ARM cores, which catches keystrokes from the USB keyboard and send it to our emulator hosted within the FPGA. To get an overall context, here is the main method of our USB stack program:

int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    initUsb();
    status = 0;
    state_machine();
    usleep(100000000);
    cleanup_platform();
    return 0;
}

We do some initialisation, and then we sleep for a long period of time (which in this case is 100 seconds). This sleep is necessary so our program is not terminated as a whole.

The code that does the actual work is the method state_machine, which is invoked every 10 milliseconds by a timer interrupt.

It should be noted that this program is running in standalone mode, and the usleep library call is implemented using busy waiting.

With busy waiting your CPU runs at full speed checking in a loop for something something to happen, which in this case is for 100 seconds to past.

As we know with busy waiting, your CPU is effectively running at 100% utilisation all the time, which uses more energy and produces more heat.

So, how heat will be produced by above program when we run for about half an hour?

Vivado provides some tools for us to answer this question. On the Hardware dashboard, temperature is one of the probes you can add.

When I started the emulator on the Zybo board, the temperature was around 51oC. Within minutes temperature has risen to about 54oC.

After about half an hour, the temperature settled to about 58oC.

This wasn't as bad a I have expected. For interest sake, I was wondering whether you could do some overclocking on the Zybo.

Some fiddling of the settings in the Vivado Block design, it doesn't really look like there is any real overclocking options. The only options that I could see, was to set the frequency of an ARM core between 50MHz to 667MHz.

So, in short, it doesn't look like using the Zybo board for long periods would cause any kinds of overheating.

Also, busy waiting didn't appear to be a big issue after all. However, I was still wondering what kind of temperature difference it would make if we could avoid busy waiting.

On ARM processors, the instruction WFI (wait for interrupt) is provided for this purpose. As per the documentation on ARM's web site:

WFI (Wait For Interrupt) makes the processor suspend execution (Clock is stopped) until one of the following events take place:


  • An IRQ interrupt
  • An FIQ interrupt
  • A Debug Entry request made to the processor.

So, in our case when we call WFI, our CPU would freeze until our timer interrupt fires externally:

int main()
{
    Xil_DCacheDisable();
    init_platform();
    initint();
    initUsb();
    status = 0;
    state_machine();
    asm("loop: wfi");
    asm("b loop");
    cleanup_platform();
    return 0;
}

Here we with added some inline assembly for invoking wfi. It should be remembered once an external interrupt has occurred and has been served, code execution will continue just after the wfi instruction.

It is therefore important that we loop back to the wfi instruction. If we don't, our main method will run to completion.

When we monitor the temperature when we use the WFI method, the Zynq definitely runs cooler. During this run I saw a temperature between 52oC and 53oC.  About a 5 degree difference!

In Summary

In this post we implemented all eight sprites within our VIC-II module. We also implemented the different priorities between Sprites and the Background.

This indeed brought us to the point where we could fully play the game Dan Dare within our emulator.

With this we are nearing almost the end of this Blog Series. There is, however, one more thing I would like to do, and this is to see if it is possible to add sound to the emulator.

So, in the next post we will start to implement sound.

Till next time!

Thursday, 10 October 2019

Implementing Sprites: Part 2

Foreword

In the previous post we added some very basic sprite functionality to our C64 emulator that enabled us to show a moving sprite.

In this post we will continue to add some more sprite functionality, which will involve the capability to expand a sprite and multicolor mode.

Sprite Expansion

Sprites on the VIC-II has the capability to be expanded in both the X direction and the Y direction.

Register D017 has a bit for each sprite indicating whether it should be expanded in the Y direction.

Similarly, register D01D has a bit for each sprite indicating whether a sprite should be expanded in the X Direction.

So, let us start off by redirecting the bits of registers D017 and D01D to our sprite_generator module as input ports:

module sprite_generator(
...
  input x_expand,
  input y_expand,
...
    );

The first thing that is effected if a sprite is expanded, is its display region. So let us modify we determine this region:

...
  wire [5:0] sprite_width;
  wire [5:0] sprite_height;
...
  assign sprite_height = y_expand ? 42 : 21;
  assign sprite_width = x_expand ? 48 : 24;
...
  assign sprite_display_region = (raster_y_pos >= sprite_y_pos && raster_y_pos < (sprite_y_pos + sprite_height)) &&
                                 (raster_x_pos >= sprite_x_pos && raster_x_pos < (sprite_x_pos + sprite_width));
...

Next, let us consider what should happen when we expand the sprite in the Y direction. In such a case our 21 line sprite should cover 42 lines of the sprite area on the screen. So, each line should be repeated twice.

We do this by dividing the current line within the active sprite area by two:

...
  wire [5:0] request_line_pre;
...
  assign request_line_pre = next_raster - sprite_y_pos;
  assign request_line = y_expand ? (request_line_pre>>1) : request_line_pre;
...

Similarly, when we expand in the X direction we need to repeat each pixel on a line twice. We do this by shifting a pixel out only every second clock cycle:

...
 reg [1:0] toggle; 
...
  always @(posedge clk_in)
  if (!sprite_display_region)
    toggle <= 0;
  else
    toggle <= toggle + 1;
...
 assign toggle_single_color_bit = x_expand ? toggle[0] : 1;
...
  always @(posedge clk_in)
    if (store_byte)
      sprite_data <= {sprite_data[15:0], data[7:0]};
    else if (sprite_display_region && toggle_single_color_bit)
      sprite_data <= {sprite_data[22:0], 1'b0};
...

We achieve this slow down with the toggle counter. This counter only needs to be one bit wide for now. The reason I made it two bits wide, is for multicolor mode later on.

Multicolor Mode

Let us tackle multicolor mode next.

As we know, in multi color mode our pixels is two pixels wide. This means that we need to shift out two pixels at a time when in multicolor mode. This also implies that we can only do this shift every second clock cycle.

This sounds very similar to the previous section when we need to expand our sprite in the X direction.We will therefore make use again of the toggle counter.

When we are expanding a multicolor sprite in the X direction, we need to slow down the clocking out of the pixels even more: Four clock cycles per pixels.

With all this in mind, we need to add the following code:

...
assign toggle_multi_color_bit = x_expand ? (toggle[1:0] == 2'b11) : toggle[0];
...
 always @(posedge clk_in)
    if (store_byte)
      sprite_data <= {sprite_data[15:0], data[7:0]};
    else if (sprite_display_region && toggle_single_color_bit && !multi_color_mode)
      sprite_data <= {sprite_data[22:0], 1'b0};
    else if (sprite_display_region && toggle_multi_color_bit && multi_color_mode)
      sprite_data <= {sprite_data[21:0], 2'b0};
...

Now, at any point in time, bits [23:22] will be the value for our current pixel. Next, let us have a look of the meanings for the different bit values, as quoted from https://www.c64-wiki.com/wiki/Sprite:


  • Pixels with a bit pair of "00" appear transparent, like "0" bits do in high resolution mode.
  • Pixels with a bit pair of "01" will have the color specified in address 53285/$D025.
  • Pixels with a bit pair of "11" will have the color specified in address 53286/$D026.
  • Pixels with a bit pair of "10" will have the color specified assigned to the sprite in question in the range 53287–53294/$D027–D02E.

For above colors, we need to define extra registers within our VIC-II module, and connect it via input ports on our sprite_generator:

module sprite_generator(
...
  input [3:0] sprite_multi_0,
  input [3:0] sprite_multi_1,
  input [3:0] primary_color,
...
    );

Let us now create a case statement for the different colors:

...
 reg [3:0] output_pixel_multi;
...
  always @*
    case (sprite_data[23:22])
      2'b01: output_pixel_multi = sprite_multi_0;
      2'b10: output_pixel_multi = primary_color;
      2'b11: output_pixel_multi = sprite_multi_1;
      default:  output_pixel_multi = 0;
    endcase
...

Let us wire up some finals:

...
assign output_pixel = multi_color_mode ? output_pixel_multi : output_pixel_single;
...
assign show_pixel = sprite_enabled && (multi_color_mode ? !(sprite_data[23:22] == 2'b0) : sprite_data[23]) && sprite_display_region;
...

Test Program and Test Results

To test the code we have developed, we need a simple test program that displays multicolor sprites that is X- and Y-Expanded.

There is a nice example program for multicolor expanded sprites, in the Book Introduction to Basic: Part 2. The Book was part of a two book series published in 1983, titled: An Introduction to Basic - The Comprehensive Teach yourself programming series.

Both these books are available for download on archive.org. The program appear on pages 300, 301 and is titled Glasgow Bus. In this program we have a multicolored sprite expanded in both directions, which moves from left to right.

The following video shows execution of this program within our FPGA C64 implementation:


In Summary

In this post we implemented Sprite expansion capability and multicolor mode within our C64 emulator.

In the next post we will connect up eight instances of our sprite_generator within our C64 emulator, and see if we can make the characters appear when we play the game Dan Dare.

Till next time!

Saturday, 5 October 2019

Implementing Sprites: Part 1

Foreword

In the previous post we have implemented Raster Interrupts as well as Multicolor Text mode.

This enabled us to completely render the status bar and the Background of the game. We were even able to move between screens of the environments.

There was, however, a crucial piece of the game play experience missing: The characters were invisible!

The reason I our characters were hidden, was because we haven't implemented Sprites in our VIC-II module yet. So, our next focus in this Blog series would be to implement Sprites in C64 FPGA implementation.

To implement sprites in an upcoming C64 emulator can be quite a daunting task. The following tasks come to mind, just to name a few:
  • Coordinate memory access between fetching Sprite data, fetching screen memory content and fetching character image data.
  • Mixing the sprite images with Text /bitmap mode graphics to get the final picture
  • Adding functionality to either show sprites in front of text or behind it.
  • Dealing with transparency
  • Implementing multicolor mode
  • Adding functionality to stretch a sprite in ether the Y- or X-direction
I have therefore decided to split the implementation of sprites into a number of separate posts.

In this post we will focus on showing a single sprite in front of text.

To test the resulting Sprite implementation, we will be using a simple Basic program for moving a sprite across the screen.

Retrieving Sprite Data from Memory

Let us start our journey of implementing sprite rendering by thinking how we will be fetching sprite data from memory.

A good start will be to review how our VIC-II currently interface with memory to get image data. Here is a quick outline:
  • Output port named addr for sending required address of which we want data for.
  • Requested data is send to input port data_in. This port is 12 bits wide, eight bits data and 4 bits from Color RAM. In this way for each screen location the character code and associated color arrives simultaneously, thus eliminating the need for an extra memory cycle to get the color code.
  • Memory requests is clocked at 2MHz. This translates to 2 memory accesses during an eight pixel period.
If you went through the particulars of the VIC-II, you will see that it clocks memory accesses at 1MHz. So, why am I clocking memory at 2MHz in my VIC-II implementation?

The key to this answer lies in the fact that within a C64 memory access happens on both the rising edge and falling edge of a 1Mhz clock cycle. The VIC-II access memory on the rising edge and the 6510 CPU on the falling edge of a 1MHz clock pulse.

There is, however, cases where the VIC-II will access memory on both the rising and falling edge. This happens at the beginning of each character line, where the VIC-II needs to retrieve the character code as well as the relevant pixel data to display. The VIC-II needs that extra time to retrieve the code for the character to be displayed, so the CPU cannot do any memory accesses during these times.

As we can see memory access times is very tight for the VIC-II, so one might wonder how the VIC-II manages to get some memory cycles for retrieving spite data. This is where Christian Bauer's write-up on the VIC-II comes to the rescue, as explained here. The section of interest is 3.6.3, Timing of a raster line.

In this section a couple of VIC-II memory access diagrams is shown for a couple of scenarios. The scenario where sprites 2-7 is active on a raster line gives us a very good idea where the VIC-II rertrieves Sprite data from memory (I have added the legend for convenience):

Cycl-# 6                   1 1 1 1 1 1 1 1 1 1 |5 5 5 5 5 5 5 6 6 6 6 6 6
       5 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 |3 4 5 6 7 8 9 0 1 2 3 4 5 1
        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ΓΈ0 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ _ _ _ _ _ _ _
       __                                      |
   IRQ   ______________________________________|____________________________
                             __________________|________
    BA ______________________                  |        ____________________
                              _ _ _ _ _ _ _ _ _| _ _ _ _ _ _ _
   AEC _______________________ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ ______________
                                               |
   VIC ss3sss4sss5sss6sss7sssr r r r r g g g g |g g g i i i i 0sss1sss2sss3s
  6510                        x x x x x x x x x| x x x x X X X
                                               |
Graph.                      |===========0102030|7383940============
                                               |
X coo. \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\
       1111111111111111111111111110000000000000|1111111111111111111111111111
       999aaaabbbbccccddddeeeeffff0000111122223|344445555666677778888889999a
       48c048c048c048c048c048c048c048c048c048c0|c048c048c048c048c04cccc04c80

  c  Access to video matrix and Color RAM (c-access)
  g  Access to character generator or bitmap (g-access)
 0-7 Reading the sprite data pointer for sprite 0-7 (p-access)
  s  Reading the sprite data (s-access)
  r  DRAM refresh
  i  Idle access

  x  Read or write access of the processor
  X  Processor may do write accesses, stops on first read (BA is low and so
     is RDY)


Let us unpack the above diagram a bit. Right on top we have cycle number, which is a count that gets incremented at a frequency of 1MHz.

The diagram starts with the last cycle of the previous line, which in this case is 65, which is obviously an NTSC VIC-II variant.

The diagram then continues from cycle#1 to cycle#19. For clarity the cycles from 20-52 is omitted in the diagram, and we continue again from cycle 53 to 65 and the first pixel of the next line.

You will also notice that after each each cycle number there is a space. This is just for the potential time period where the VIC-II might make use of both the rising and the falling edge of the 1MHz clock cycle for retrieving data.

The Graph row shows us when the electron beam is busy drawing on the screen. Making that statement felt a weird for me since we don't really use CRT's anymore :-)

On the Graph row equal(=) is the period when the Border is drawn, and a digit the when we are actually drawing character data.

With the Graph row as reference, let us have a look at the VIC row to see when sprite data is read from memory.

We can actually see that just before we are finished drawing a screen line, which is obviously a border operation, we start reading sprite info. It starts with: 0sss. We read sprite data pointer 0 from the end of screen memory, proceeded by reading three bytes of sprite data.

We do the same for sprites 1 to 7 and we stop just before we start drawing the border of the next raster line.

Also, please note that on the diagram, there is no space between the sprite pointer and sprite data data symbols. So, when we get the relevant data for a sprite, we are using all available memory cycles.

In short, the sprite data is retrieved during the none visible parts of a screen line, or to use PAL and NTSC terminology, during the front- and back-porch of a scan line.

Implementing sprite data Memory accesses

The previous section gives us a good indication where we should implement the reading of sprite data during the drawing of line.

Admitted, in our VIC-II we don't start with a blank period on a rasterline, but rather start immediately to draw the border on the beginning of each raster line.

We will, however, use the blank period of the end of the rasterline to read sprite data.

Let us start to some calculations. There are 4 memory accesses per sprite (e.g. sprite pointer and three bytes). For sprites there are two memory accesses per cycle. So, a sprite needs two cycles to get all its data.

We will be using our x_pos counter to determine when to read sprite data. For this purpose, we write the following code:

...
sprite_data_region;
...
assign sprite_data_region = (x_pos > 368 && x_pos < 496);
...

Our rasterline is 504 pixels, so we start reading sprite data at the very end of the line. Obviously the sprite pixels will be shown in the next line.

It is also more convenient to work with an offset of 368, rather than an absolute x_pos value:

...
wire [9:0] sprite_data_region_offset;
...
assign sprite_data_region_offset = {1'b0, x_pos} - 368;
...

Within this sprite data region, each sprite gets its data in a time period 16 pixels. We can therefore extract the following information from sprite_data_region_offset:

  • bits 6 - 4: sprite number
  • bits 3 - 0: Time cycle within the data cycle of current sprite. This is useful for orchestrating the various reads that should happen for the sprite. 

Next, let us focus on address generation. With this we should keep in my mind that our VIC-II module reads data from a memory port that is clocked at 2MHz. In relation to a group of 8 pixels this clock pulse at pixel number 3 and pixel number 7.

Let us change our address generation functionality as follows:

...
reg [7:0] sprite_data_location; 
wire [1:0] sprite_byte_num;
...
assign sprite_byte_num = sprite_data_region_offset[3:2] + 2'b11;
...
always @(posedge clk_in)
if (sprite_data_region && sprite_data_region_offset[3:0] == 4)
  sprite_data_location <= data_in[7:0];
...
     always @*
       if (!sprite_data_region && (clk_counter == 6 | clk_counter == 7))
         addr = bit_data_pointer;       
       else if (sprite_data_region && (sprite_data_region_offset[3:0] < 3))
         addr = {mem_pointers[7:4], 7'h7f, sprite_data_region_offset[6:4]};
       else if (sprite_data_region)
         addr = {sprite_data_location, (sprite_0_offset + sprite_byte_num)}; 
       else
         addr =  {mem_pointers[7:4], screen_mem_pos};
...

So, in each sprite section, we need to ensure we assert the address for the applicable sprite pointer before the first 2MHz clock pulse trigger. When this clock pulse trigger the Block RAM will return the value of the sprite pointer in question.

We store this sprite pointer in a register called sprite_data_location at a pixel period after the Block RAM returned this pointer.

Sprite_data_location will be used to generate addresses for the actual sprite data. We need the following pieces of information in addition to generate the addresses for the sprite data:
  • sprite offset: Line number of the sprite we need data for as a linear address. This will be a factor of three. For instance, should we need the sprite data for line 2, we will specify 6 as the offset.
  • sprite_byte_num: Either return 0,1 or 2 of the requested line.
We will be using bits 3 and 2 of sprite_data_region_offset for the sprite_byte_num. One should rather remember that we only start reading sprite data from combination 01 and not from combination 00.

During combination 00 we are still reading sprite_data_location. We therefore need to subtract one from this bit combination to get the actual bit combination.

As a quick hack, I achieved this subtraction by one by just adding 2'b11 with the help of Two's complement.

With this piece of code in place we will be receiving the sprite data for all the sprites during the applicable time frames. It is up to use to actually catch this data at the right time and manipulate it up to the point that the sprite gets rendered on the screen.

For all this functionality it makes sense to encapsulate it in a sprite_generator module, which we will cover in the next section.

The Sprite Generator module

Let us start our Sprite Generator module by providing input ports indicating the current Raster Position and Sprite position:

module sprite_generator(
  input clk_in,
  input [8:0] raster_y_pos,
  input [8:0] raster_x_pos,
  input [8:0] sprite_x_pos,
  input [7:0] sprite_y_pos,
    );


First thing we need to calculate is the linear address for the sprite line we want data for:

...
  wire [8:0] next_raster;
  wire [5:0] request_line;
...
  assign next_raster = raster_y_pos + 1;
  assign request_line = next_raster - sprite_y_pos;
  assign request_line_offset = (request_line << 1) + request_line;
...


The calculation of the linear address involves multiplying the line number by three, which we achieve by left shifting the line number by left and adding the line number to the result. 

Next, we should add a 3 byte shift register to our sprite generator that can shift the data bytes in when it arrives, as well as shifting the bits out when we are in the display region of the sprite:

module sprite_generator(
...
  input store_byte,
  input [7:0] data,
...
    );
...
  wire sprite_display_region;
  reg [23:0] sprite_data;
...
  assign sprite_display_region = (raster_y_pos >= sprite_y_pos && raster_y_pos < (sprite_y_pos + 21)) &&
                                 (raster_x_pos >= sprite_x_pos && raster_x_pos < (sprite_x_pos + 24));
...
  always @(posedge clk_in)
    if (store_byte)
      sprite_data <= {sprite_data[15:0], data[7:0]};
    else if (sprite_display_region)
      sprite_data <= {sprite_data[22:0], 1'b0};
...

So, when we are within the visible region of the sprite we shift out the contents of sprite_date one pixel at a time. What we need to next, is to output a color for each bit we shift out.

For now we will only output the color white if the bit is a 1:

module sprite_generator(
...
  output [3:0] output_pixel,
...
    );

...
assign output_pixel = sprite_data[23] ? 4'b1 : 0;
...

One thing we should keep in mind, is that bits with the value zero are in fact transparent. So, we need to have an additional output port indicating whether the current pixel should be transparent or not.

If a sprite pixel is transparent, it just a condition indicating that the VIC-II shouldn't show the current sprite pixel. There are additional conditions in which our VIC-II shouldn't show the pixel of a sprite:
  • The sprite is disabled
  • We are not current within the area of the sprite on the screen
Let us wrap all these conditions together and output to a single port:

module sprite_generator(
...
  input sprite_enabled,
  output show_pixel,
...
    );
...
  assign show_pixel = sprite_enabled && sprite_data[23] && sprite_display_region;
...

Wiring everything up

With our sprite generator created, let us hook up to our VIC-II module.

We need to end up with 8 sprite generators. That is a sprite_generator for each sprite. However, for now, to keep things simple, we will just be using a single one.

To start with, we are going to implement some more of the VIC-II registers in our VIC-II module:

reg [7:0] sprite_0_xpos;
reg [7:0] sprite_0_ypos;
reg [7:0] sprite_1_xpos;
reg [7:0] sprite_1_ypos;
reg [7:0] sprite_2_xpos;
reg [7:0] sprite_2_ypos;
reg [7:0] sprite_3_xpos;
reg [7:0] sprite_3_ypos;
reg [7:0] sprite_4_xpos;
reg [7:0] sprite_4_ypos;
reg [7:0] sprite_5_xpos;
reg [7:0] sprite_5_ypos;
reg [7:0] sprite_6_xpos;
reg [7:0] sprite_6_ypos;
reg [7:0] sprite_7_xpos;
reg [7:0] sprite_7_ypos;
reg [7:0] sprite_msb_x = 0;
reg [7:0] sprite_enabled;

always @(posedge clk_1_mhz)
     case (addr_in)
       6'h00: data_out_reg <= sprite_0_xpos;
       6'h01: data_out_reg <= sprite_0_ypos;
       6'h02: data_out_reg <= sprite_1_xpos;
       6'h03: data_out_reg <= sprite_1_ypos;
       6'h04: data_out_reg <= sprite_2_xpos;
       6'h05: data_out_reg <= sprite_2_ypos;
       6'h06: data_out_reg <= sprite_3_xpos;
       6'h07: data_out_reg <= sprite_3_ypos;
       6'h08: data_out_reg <= sprite_4_xpos;
       6'h09: data_out_reg <= sprite_4_ypos;
       6'h0a: data_out_reg <= sprite_5_xpos;
       6'h0b: data_out_reg <= sprite_5_ypos;
       6'h0c: data_out_reg <= sprite_6_xpos;
       6'h0d: data_out_reg <= sprite_6_ypos;
       6'h0e: data_out_reg <= sprite_7_xpos;
       6'h0f: data_out_reg <= sprite_7_ypos;
       6'h10: data_out_reg <= sprite_msb_x;
       6'h15: data_out_reg <= sprite_enabled;
       
       6'h20: data_out_reg <= {4'b0,border_color};
       6'h21: data_out_reg <= {4'b0,background_color};
       6'h22: data_out_reg <= {4'b0,extra_background_color_1};
       6'h23: data_out_reg <= {4'b0,extra_background_color_2};
       6'h11: data_out_reg <= {y_pos_real[8],screen_control_1[6:0]};
       6'h12: data_out_reg <= {y_pos_real[7:0]};
       6'h16: data_out_reg <= screen_control_2;
       6'h18: data_out_reg <= mem_pointers;
       6'h19: data_out_reg <= {7'h0,raster_int};
       6'h1a: data_out_reg <= int_enabled;
     endcase

always @(posedge clk_1_mhz)
begin
  if (we & addr_in == 6'h00)
    sprite_0_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h01)
    sprite_0_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h02)
     sprite_1_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h03)
     sprite_1_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h04)
     sprite_2_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h05)
     sprite_2_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h06)
     sprite_3_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h07)
     sprite_3_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h08)
     sprite_4_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h09)
     sprite_4_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0a)
     sprite_5_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0b)
     sprite_5_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0c)
     sprite_6_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0d)
     sprite_6_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0e)
     sprite_7_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0f)
     sprite_7_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h10)
     sprite_msb_x <= reg_data_in[7:0];
  else if (we & addr_in == 6'h15)
     sprite_enabled <= reg_data_in[7:0];

...
end




We now have enough information to link up most of the inputs ports of our sprite_generator:

sprite_generator sprite_0(
  .raster_y_pos(y_pos),
  .raster_x_pos(x_pos),
  .sprite_x_pos({sprite_msb_x[0],sprite_0_xpos}),
  .sprite_y_pos(sprite_0_ypos),
  .data(data_in[7:0]),
  .sprite_enabled(sprite_enabled[0]),
    );

We still need to connect the input port store_byte. The following snippet of code will basically generate this signal for us:

...
reg store_sprite_pixel_byte;
...
always @*
  case (sprite_data_region_offset[3:0])
    7, 11, 15:  store_sprite_pixel_byte = sprite_data_region && 1;
    default:  store_sprite_pixel_byte = 0;
  endcase
...

So, for any given sprite data phase, pixel periods 7, 11 and 15 is the periods just after sprite data bytes was retrieved from block RAM. At these times we would like to persist the data byte to our sprite_generator.

However, this code will return the data bytes for all the sprites for a rasterline, so we cannot use as is for the store_byte input port. So, when we assign to the store_byte input port, we just an extra just to make sure the data byte is meant for our sprite_generator:

sprite_generator sprite_0(
...
  .store_byte(store_sprite_pixel_byte && sprite_data_region_offset[6:4] == 0),
...
    );


All our input ports are now connected.

Let us now see how we can use the output ports of our sprite_generator to render sprites with our VIC-II module:

wire show_pixel_sprite_0;
wire [3:0] out_pixel_sprite_0;


sprite_generator sprite_0(
...
  .show_pixel(show_pixel_sprite_0),
  .output_pixel(out_pixel_sprite_0),
...
    );
...
   assign color_for_bit = multicolor_data ? multi_color :    
            (pixel_shift_reg[7] == 1 ? char_buffer_out_delayed[11:8] : background_color);
   assign color_for_bit_with_sprite = show_pixel_sprite_0 ? out_pixel_sprite_0 : color_for_bit;

   assign final_color = (visible_vert & visible_horiz & screen_enabled) ? color_for_bit_with_sprite : border_color;
...

The actual mixing of the sprite images and the main graphics happens in the wire color_for_bit_with_sprite. If our sprite_generator asserts the show_pixel output port, the sprite pixel will be shown. Otherwise we show a pixel of the main graphics.

This concludes the sprite implementation for this post.

In the following sections we will test our implementation.

Creating a Testbed for simulation

As you can see from this post, there is quite a bit of code that needs to be written just for a single sprite to be displayed.

When undertaking a task like in this post, it is always handy to have a simulation Testbed at hand, like we had created a previous post where we implemented Multicolor Bitmap mode.

The RAM image of such a Testbed contains data for a test image that enables us to test snippets of new code within seconds.

We start by getting hold of any simple program in C64 BASIC that will display a sprite for us. For this purpose I will be using a program in the C64 Users Guide, which is discussed on pages 68 - 71.

This program will show the following image as a sprite:

The program listing is as follows:


In this program we need to make two small modifications, which involves removing the clear screen statement, and using Sprite zero instead of sprite 2.

We remove the Clear Screen statement because we want actually to see that the sprite renders correctly against a background.

We will use Vice C64 emulator to run this code and to create a RAM image for our testbed. For this I will be using the same process as we used in the previous post where we have developed multicolor bitmap mode. So I will not cover the process here.

Here is quick screenshot of Vice C64 emulator executing the test BASIC program:


We have the balloon with the BASIC program as a background. We need to create a RAM image from a Vice Snapshot for our testbed.

Our testbed should then render more or less the same image.

Test Results

Here is an image from our Tesbed


The colora are a bit different from our standard C64 startup colors. This is because a Reused the testbed from a previous post where we developed the multicolor bitmap mode.

Other than that, this image looks more or less the same as the Vice screenshot in the previous section.

However, you will realise the sprite has a bit of a offset compared to the VICE emulator screenshot. Taking the fourth data element on line 220 (e.g. value 3) as reference, you will see on the VICE screenshot the balloon is situated Southeast of the '3', whereas in our testbed image the balloon appears west of the '3'.

These kind of offsets is probably expected, since our C64 FPGA implementation is by no means 100% cycle accurate compared to a real C64.

For now we will just do some offset hacks to get the sprite displayed in the correct position:

sprite_generator sprite_0(
...
  .raster_y_pos(y_pos - 5),
  .raster_x_pos(x_pos - 16),
...
    );


With these changes, let us see how this program runs on the physical FPGA:


This correlates more or less to the VICE rendering of this BASIC program.

In Summary

In this post we have started to implement sprite functionality within our C64 FPGA.

In the next couple of posts we will continue to implement sprite functionality.

Till next time!

Tuesday, 24 September 2019

Raster Interrupts and Multicolor Textmode

Foreword

In the previous post we fixed the blank screen by applying the necessary clock constraints.

In this post we will focus on implementing Raster Interrupts and Multicolor Textmode.

Implementing Raster Interrupts

To implement Raster Interrupts, let us start by looking at all the registers that is involved with Raster Interrupts:


  • Bit 7 of D011: This is bit 8 of the Raster counter
  • D012: This is bit 7 - 0 of the Raster counter
  • Bit 0 of D019: Interrupt status bit of the Raster Interrupt
  • Bit 0 of D01A: If interrupts for Raster is enabled
When we read the bits from the Raster counter, we are basically returning the values of the y_pos registers. So, we can implement the read operation as follows:

always @(posedge clk_1_mhz)
     case (addr_in)
...
       6'h11: data_out_reg <= {y_pos[8],screen_control_1[6:0]};
       6'h12: data_out_reg <= {y_pos[7:0]};
...
     endcase


When we write to the Raster counter bits, we are writing to a compare value register, that the VIC-II uses to compare the Raster count to see when it is time to trigger a Raster Interrupt.

We implement the writing to this register as follows:

...
reg [7:0] rasterline_ref = 0;
...
always @(posedge clk_1_mhz)
begin
...
  else if (we & addr_in == 6'h12)
    rasterline_ref <= reg_data_in[7:0];
...
end
...


For bit 8 of the raster compare register we just use bit 7 of screen_control_1 register, and we just combine when we want to use the value.

The first part of implementing the Raster Interrupt would be to implement the compare operation:

assign is_equal_raster = {screen_control_1[7],rasterline_ref} == y_pos_real[8:0];

Next, let us implement the Status register for the Raster interrupt:

...
reg raster_int = 0;
...
always @(posedge clk_1_mhz)
if (we & (addr_in == 6'h19))
  raster_int <= raster_int & ~reg_data_in[0];
else
  raster_int <= raster_int | (is_equal_raster);
...
always @(posedge clk_1_mhz)
     case (addr_in)
...
       6'h19: data_out_reg <= {7'h0,raster_int};
...
     endcase
...

As you can see in the code above, to clear a raster interrupt that occurred, you would write a one to bit 0 of address D019.

There is small anomaly with the code we have just written. is_equal_raster will be one for the full direction of the line. This means that as soon as we clear the interrupt in raster_int, it will just set the interrupt at the next clock cycle.

To fix this, we need to set the interrupt only on the edge transition of is_equal_raster:

always @(posedge clk_1_mhz)
is_equal_raster_delayed <= is_equal_raster;

always @(posedge clk_1_mhz)
if (we & (addr_in == 6'h19))
  raster_int <= raster_int & ~reg_data_in[0];
else
  raster_int <= raster_int | (is_equal_raster & !is_equal_raster_delayed);


Next, we should output this interrupt from our VIC-II module, gated via an interrupt enable bit:

module vic_test_3
(
...
  output irq, 
...
    );
...
reg [7:0] int_enabled = 0;
...
assign irq = raster_int & int_enabled[0];
...
always @(posedge clk_1_mhz)
     case (addr_in)
...
       6'h1a: data_out_reg <= int_enabled;
     endcase
...
always @(posedge clk_1_mhz)
begin
...
  else if (we & addr_in == 6'h1a)
      int_enabled <= reg_data_in[7:0];
end
...

This interrupt we will hook up to the interrupt port of our CPU module. Since we already have the IRQ of CIA#1 connected to this port, we combine both together with an or operation:

cpu mycpu ( clk_1_mhz, c64_reset, addr, combined_d_out, ram_in, 
                                       we_raw, (irq || irq_vic), 1'b0, 1'b1 );

Multicolor TextMode

In a previous post we have implemented Multicolor Bitmap Mode, so for a start let us summarise the different colors for a pixel pair for both Multicolor Bitmap mode as well as for Multicolor Text Mode.

For Multicolor Bitmap mode the colors are as follows:

  • 00: Color stored at location D021
  • 01: Bits 4-7 of the associated code in Screen memory
  • 10: Bits 0-3 of the associated code in Screen memory
  • 11: Associated value in Color RAM  
For Multicolor Text Mode the colors are as follows:
  • 00: Color stored in location D021
  • 01: Color stored in location D022
  • 10: Color stored in location D023
  • 11: Associated value in Color RAM 
There is something in addition that we should be aware of for bit value 11 for Multicolor Text mode. From the value from Color RAM in this scenario we can only use bit 0 - 2 for a color code, limiting us to only color codes 0 -7 in multicolor text mode.

Why can't we use bit 3 from color RAM in Multicolor Text mode? In multicolor Text mode bit 3 have a very interesting function. If we set bit 3 to a one, it means that we really want to use multicolor text mode for this character cell.

I am sure many of you have burst out laughing when reading the previous paragraph :-) Why would you need to double confirm when you want to draw a character cell in Multicolor Text mode? This is actually a nice feature if you want to draw some characters in normal text mode, giving your picture a bit more detail where required.

For now we are just going to keep this fact about bit 3 of Color RAM in the back of our heads till a bit later.

For the two Multicolor Modes we can create two combinational logic blocks for outputting the relevant color based on the pixel pair value:

always @*
  case (pixel_shift_reg[7:6])
    2'b00: multi_color_bitmap_mode = background_color;
    2'b01: multi_color_bitmap_mode = char_buffer_out_delayed[7:4];
    2'b10: multi_color_bitmap_mode = char_buffer_out_delayed[3:0];
    2'b11: multi_color_bitmap_mode = char_buffer_out_delayed[11:8];
  endcase

always @*
  case (pixel_shift_reg[7:6])
    2'b00: multi_color_text_mode = background_color;
    2'b01: multi_color_text_mode = extra_background_color_1;
    2'b10: multi_color_text_mode = extra_background_color_2;
    2'b11: multi_color_text_mode = {1'b0, char_buffer_out_delayed[10:8]};
  endcase


Please note, as explained earlier, we are outputting only bits 2 - 0 of the Color RAM for bit value 11 when in Multicolor TextMode.

We can now cmbine the two color values, depending on whether we are in text mode or bitmap mode:

assign multi_color = screen_control_1[5] ? multi_color_bitmap_mode : 
                       multi_color_text_mode;

We are just about done. However, we still need to consider the scenario of bit 3 of Color RAM when we are in Multicolor Text Mode. This is just an extension of the the check of whether we are in Multicolor mode for the current character cell:

assign multicolor_data = screen_control_2[4] && 
                !(!char_buffer_out_delayed[11] && !screen_control_1[5]);

We use this value as follows:

...
   always @(posedge clk_in)
   if (clk_counter == 7)
     pixel_shift_reg <= data_in;
   else begin
     if (multicolor_data & (clk_counter[0]))
       pixel_shift_reg <= {pixel_shift_reg[5:0],2'b0};
     else if (!multicolor_data) 
       pixel_shift_reg <= {pixel_shift_reg[6:0],1'b0};
   end
...
   assign color_for_bit = multicolor_data ? multi_color :    
            (pixel_shift_reg[7] == 1 ? char_buffer_out_delayed[11:8] : background_color);
...

This concludes the code we need to write for implementing Multicolor Text Mode

Test Results

Here is screenshot of the start of the game with all the changes applied so far as described in this post:


We are just about there. Only thing that is strange is a band of random pixels above the Dan Dare Title bar.

These random characters is caused by changing character maps a couple of Raster lines too early. So, the raster counts of our VIC-II module is a bit different than that of a real VIC-II.

It can become quite an exercise to troubleshoot the difference. For now I am only going to fiddle with the Raster count offset till the image looks ok.

The following snippet of code will subtract a given offset from the Raster count:

wire [9:0] y_pos_minus_offset;
wire [9:0] y_pos_real;

assign y_pos_minus_offset = {1'b0,y_pos} - 5;
assign y_pos_real = (y_pos_minus_offset > 400) ? (10'd312 + y_pos_minus_offset) : 
          y_pos_minus_offset;


In this snippet of code we have chosen a offset value of 5 (which was actually my final attempt giving the correct result). In this code we also cater for the scenario where we wrap around, in which case subtracting 5 will yield a two's complement negative number.

To deal with this two's complement manipulation I have also added an extra bit to both y_pos_minus_offset and y_pos_real to avoid overflow conditions.

You will then use y_pos_real in all places where you compare raster counts or where the 6502 read raster counts:

...
assign is_equal_raster = {screen_control_1[7],rasterline_ref} == y_pos_real[8:0];

always @(posedge clk_1_mhz)
     case (addr_in)
...
       6'h11: data_out_reg <= {y_pos_real[8],screen_control_1[6:0]};
       6'h12: data_out_reg <= {y_pos_real[7:0]};
...
     endcase

With this quick fix, our game screen render correctly.

The following video show a quick tour when walk through three screens and fighting with a Treen:


Our characters is still invisible because we haven't implemented sprites yet, but at least see the messages popping up when we encounter the enemy!

In Summary

In this post we have implemented Raster interrupts and Multicolor text mode.

This enabled us to render the background screen properly of the game, as well as moving around between screens.

Our characters is still invisible because we haven't implemented sprites yet.

In the next post we will implement sprites so that our characters can appear!

Till next time!

Saturday, 21 September 2019

Fixing a Blank Screen

Foreword

In the previous post we have implemented joystick control to our C64 FPGA module that enabled us to start the game from the Intro screen.

The game, however, started frozen and the screen looked garbled.

To fix this  frozen garbled screen,  my next goal was to implement Raster interrupts, as well as multicolor text mode.

While trying to implement Raster Interrupts and Multicolor Text mode, I was presented with a nasty suprise when the Zybo board started up with our C64 design: A Blank screen!

This Blank Screen proved indeed to be a challenge and a half to debug.

The Take Home from this exercise proved to be quite an important one for general FPGA design, so decided to dedicated this post on how this issue was resolved.

Background

You might recall once (quite a number of posts back), that our 6502 core just started crashing without any apparent reason.

After some careful investigation, I found that this crash was caused by a newly implemented clock divider for generating a 1MHz signal from a 8MHz signal by means of a binary counter.

The quirks of this matter was that as soon as you use a binary counter for a clock source, you should ensure you implement the correct constraints in your design. If you don't, you might end up with a noisy, spike clock signal causing all kinds of unexpected behaviour.

Since I fix the issue with the necessary constraints, I never really had the same issue again while I was adding more functionality to the design. That is, up until now.

I was making good progress with Raster interrupts until I was greeted by a blank screen while testing some of my changes on the Zybo board.

From similar issues in the past with our C64 FPGA design, a blank screen tells me that our 6502 module crashed earlier in the process, which in this case was caused again by a clocking constraint violation.

The Solution


I am going to try and explain how I managed to isolate this issue. Before we do, let us just recap on how I solved the clock divider issue originally, the surrounding technical details can make a bit more sense.

Here is the snippet of code for generating a 1MHz clock and a 2MHz clock in our design:

...
reg [2:0] clk_div_counter = 0;
...

    always @(posedge clk)
      clk_div_counter <= clk_div_counter + 1; 

    always @(negedge clk)
      clk_1_enable <= (clk_div_counter == 7);
...
    always @(negedge clk)
      clk_2_enable <= (clk_div_counter == 2) | (clk_div_counter == 6) ;

       BUFGCE BUFGCE_1_mhz (
       .O(clk_1_mhz),   // 1-bit output: Clock output
       .CE(clk_1_enable), // 1-bit input: Clock enable input for I0
       .I(clk)    // 1-bit input: Primary clock
    );

       BUFGCE BUFGCE_2_mhz (
       .O(clk_2_mhz),   // 1-bit output: Clock output
       .CE(clk_2_enable), // 1-bit input: Clock enable input for I0
       .I(clk)    // 1-bit input: Primary clock
    );
...

From the clk_div_counter, we are creating clock enable signals that drives a BUFGCE block, which is essentially a buffer.

A key feature of these two buffers is that is physically close to clocking circuits on the FPGA die. Also, the clock output pulse from these buffers can drive quite number flip flops, while maintaining a clean waveform.

Apart from this code, we still need to define some constraints:

create_generated_clock -name clkdiv1 -source [get_pins design_1_i/block_test_0/inst/BUFGCE_1_mhz/I0] 
                       -edges {1 2 17} [get_pins design_1_i/block_test_0/inst/BUFGCE_1_mhz/O]
create_generated_clock -name clkdiv2 -source [get_pins design_1_i/block_test_0/inst/BUFGCE_2_mhz/I0] 
                       -edges {7 8 15} [get_pins design_1_i/block_test_0/inst/BUFGCE_2_mhz/O]



So, where did things went wrong in our existing design? It all becomes clear when you have a look at the schematic view of the Synthesised design within Vivado. Have a look at the following extract of the schematic:

This corresponds more or less to the Verilog code shown earlier. Needless to say, both the registers clk_1_enable and clk_2_enable ended up as a flip-flops, which can be quickly identified by the triangle next to the C input.

Both the Data inputs is fed from a bit from the clk_div_counter. This is where things starts to get interesting. If we follow the schematic, we see that there is not really a direct path from clk_div_counter to the enable registers.

Instead, clk_div_counter enters the VIC-II module first, and goes past quite a number of Flip-flop and LUT elements. This signal then eventually exits the VIC-II module and it is only at this point where this signal enters the enable registers.

In short, it quite a long path between clk_div_counter and the enable registers. This is not the ideal, considering that we want to generate clock signals.

The question is: How do we shorten the length between clk_div_counter and the enable registers? The short answer is that we should create a duplicate register of clk_div_counter that is serving just the enable registers.

During optimisation, however, Vivado might remove our duplicate register and we will be back at square one. So, we need a way to tell Vivado to keep our duplicate register.

There is indeed an attribute we can use to keep our duplicate register declaration called equivalent_register_removal. We would use this attribute as follows:

...
(* equivalent_register_removal = "no" *)
    reg [2:0] clk_div_counter = 0;
(* equivalent_register_removal = "no" *)
    reg [2:0] clk_div_counter_cycle = 0;
...
    always @(posedge clk)
      clk_div_counter <= clk_div_counter + 1; 

    always @(posedge clk)
      clk_div_counter_cycle <= clk_div_counter_cycle + 1; 
...

So, the one register we would use for our clock divider and the other one for our VIC-II module.

This would solve our blank screen problem.

In Summary

Originally I intended for this post to implement Raster Interrupts as well as Multicolor text mode.

During the implementation of this functionality, I was faced with a blank screen when the Zybo started up with our C64 design.

The troubleshooting this Blank Screen proved to be quite a challenge, and I rather decided to dedicate this post on what the underlying problem was.

All in all this Blank screen was caused by timing constraints for our clock divider.

In the next post we will eventually get to the implementation of Raster interrupts and Multicolor text mode.

Till next time!

Tuesday, 10 September 2019

Adding joystick control

Foreword

In the previous post we managed to display the Intro screen for our game Dan Dare within our C64 FPGA implementation.

As with many other C64 games, to actually start the game you need to press fire on a joystick. Since our emulator doesn't feature any joystick at the moment, the purpose of this post will be to add functionality to emulate a joystick.

For the joystick we will just use the Numeric Keypad on the USB keyboard attached to the Zybo Board.

We will also just be focusing on implementing joystick port #2 of the C64, since this is the port the game Dan Dare uses.

How Joystick port#2 is wired to the C64

A good start for this post would be to see how Joystick port#2 is connected on a real C64.

The following snippet of a schematic from http://www.zimmers.net:


As you can see, joystick port#2 share wires on Port A of CIA#1 with the keyboard.

This setup of shared wires between joystick and keyboard immediately reminds us of a anomaly of joystick port#1, where moving the joystick also type characters on the screen.

One might tend to wonder why Joystick port#2 doesn't have the same effect. The answer is because we read the keyboard from port B on the CIA, which is connected to the row pins of the keyboard connector.

With no key been pressed on the keyboard, all pins would just remain high on port B of CIA#1. This is obviously by passed by Joystick #1 which is also connected to port B of CIA#1, which can pull down selected lines to zero, which the C64 will read as key presses.

Pulling down selected lines via Joystick port#2, will not have the same effect. With no keys been pressed on the keyboard, these pulled down lines would simply not propagate to port B of CIA#1.

Implementing Joystick port#2 in our C64 module

In the previous section we mentioned the concept of pulling low a line on either port A or port B on CIA#1. Thus, the keyboard and Joysticks on the C64 follows the philosophy of active when low.

Another feature of port A and port B of CIA#1 is that each pin of those ports is bidirectional.

This leaves us with the question: How do you implement a bidirectional pin in an FPGA?

One might think: Sure, instead of declaring a port pin on a module as either input or output, you can just declare the bidirectional port as inout.

You can indeed create a Verilog module with inout ports. However, as soon as you may be try to connect these ports to other Verilog modules in your design, you might end up running in circles.

This is because inout pins is really only meant for pins going to the outside world, for instance if you want to implement a I2C port on your FPGA.

The FPGA synthesis tools doesn't like it at all if you try to utilise inout ports for internal use.

So in our CIA module we would need to split our bidirectional ports into two separate ports each:

module cia(
  output [7:0] port_a_out,
  input [7:0] port_a_in,
  output [7:0] port_b_out,
  input [7:0] port_b_in,
...
    );
...

Next, we need to make a small adjustment when we read from Port A or B:

...
  always @(posedge clk)
  if(!we)
  case (addr)
    0: data_out <=  ~((~slave_reg_0 & slave_reg_2) | 
                     ~port_a_in);
                   
    1: data_out <= ~((~slave_reg_1 & slave_reg_3) | 
                     ~port_b_in);
...

Let us try and understand what is going on here.

When we read from either port A or B, a low value can either be caused by the input port, or via the corresponding output port (e.g. slave_reg_0 or slave_reg_1).

Also, the corresponding output port is enabled by either slave_reg_2 or slave_reg_3.

The combined effect of a input and output port resembles that of an OR operation, with the inputs inverted. For this reason we are doing all the negations.

Next, let us hook up port A and port B of our CIA instances:

...
    cia cia_1(
          .port_a_out(keyboard_control),
          .port_a_in({3'b111, joybits}),
          .port_b_in(keyboard_result),
...
            );
...
    cia cia_2(
          .port_a_out(cia_2_port_a),
          .port_a_in(8'b11111111),
...
            );
...

First, we hook up the five bits of our joystick to port a of CIA#1.

We also connect port A of CIA#2 to eight ones. We use the lower two bits of this port for the VIC-II banking bits. It is therefore crucial that we keep the relevant input bits high, so that the contents of the VIC-II bits doesn't get lost during bitwise operations.

Serving the joystick bits from AXI slave


Currently our AXI Slave block have two slave registers indicating which keys were pressed. Each bit position in these two registers represent the actual C64 key scan code of the key pressed.

In a similar fashion we can add a third register where each bit position represent the current posistion of the joystick, as well as whether the fire button is pressed.

Currently slave register 2 (e. g. address 0x43c0_0008), only have about three bits utilised for tape operation. So, we can just use some unused bits in this register for our joystick bits. 

We will use bits 4 to 8 of this register for the joystick bits. This falls on a nybble boundary, making it convenient to see the joystick bits when you are debugging and you see the register contents in hexadecimal format.

To wire up the joystick bits from the AXI slave to our C64 module, we would follow the same approach as we previously performed to enable keyboard access for our C64 module. I will therefore not be going into detail on this.

Redirecting Numeric Pad as Joystick bits


As mentioned earlier, we will be using the numeric pad of the USB keyboard as a joystick.

You might remember from a previous post that in order to interface a USB keyboard to our C64 module, we basically catch the USB scan codes, convert it to C64 key scan codes, and setting the relevant bit (or bits if more than one pressed simultaneously) at either address 0x43c0_0000 or 0x43c0_0004.

The C64 keyboard can produce key scan codes in the range 0 to 63. We can reuse our USB scan code -> C64 scan code routine by basically using scan codes 64 upwards for our joystick bits:

u32 mapUsbToC64(int usbCode) {
 if (usbCode == 0x4) { //A
  return 0xa;
 } else if (usbCode == 0x5) { //B
  return 0x1c;
 } else if (usbCode == 0x6) { //C
  return 0x14;
 } 

...
        } else if (usbCode == 0x28) { //enter
  return 0x1;
 } else if (usbCode == 0x2c) { //space
  return 0x3c;
 } else if (usbCode == 0x36) { //comma
  return 0x2f;
 } else if (usbCode == 53) { //play key `~
  return 100;
 } else if(usbCode == 96) { //up joystick
  return 64;
 } else if(usbCode == 90) { //down joystick
  return 65;
 } else if(usbCode == 92) { //left joystick
  return 66;
 } else if(usbCode == 94) { //right joystick
  return 67;
 } else if(usbCode == 98) { //fire joystick
  return 68;
 }
}


We invoke this method as follows:

void getC64Words(u32 usbWord0, u32 usbWord1, u32 *c64Word0, u32 *c64Word1, u32 *c64Word2) {
  *c64Word0 = 0;
  *c64Word1 = 0;
  *c64Word2 = 0;

  if (usbWord0 & 2) {
   *c64Word0 = 0x8000;
  }

  usbWord0 = usbWord0 >> 16;

  for (int i = 0; i < 2; i++) {
   int current = usbWord0 & 0xff;
   if (current != 0) {
     int scanCode = mapUsbToC64(current);
     if (scanCode == 100) {
      Xil_Out32(0x43C00008, 0);
     } else if (scanCode < 32) {
     *c64Word0 = *c64Word0 | (1 << scanCode);
     } else if (scanCode < 64) {
     *c64Word1 = *c64Word1 | (1 << (scanCode - 32));
     } else {
        *c64Word2 = *c64Word2 | (1 << (scanCode - 64));
     }

   }

   usbWord0 = usbWord0 >> 8;
  }

  for (int i = 0; i < 4; i++) {
   int current = usbWord1 & 0xff;
   if (current != 0) {
     int scanCode = mapUsbToC64(current);
     if (scanCode == 100) {
      Xil_Out32(0x43C00008, 0);
     } else if (scanCode < 32) {
     *c64Word0 = *c64Word0 | (1 << scanCode);
     } else if(scanCode < 64) {
     *c64Word1 = *c64Word1 | (1 << (scanCode - 32));
     } else {
      *c64Word2 = *c64Word2 | (1 << (scanCode - 64));
     }

   }

   usbWord1 = usbWord1 >> 8;
  }

}


We have introduced a third word c64Word2. This will be the word we will use to populate the joystick bits at address 0x43c0_0008.

Next, we need to update our old state_machine() method (our mini USB stack method) as shown by the following snippet:

void state_machine() {
...
  u32 toggle = Xil_In32(qTDAddressCheck+8) & 0x80000000;
  if (!(Xil_In32(qTDAddressCheck + 8) & 0x80)) {
   u32 word0 = Xil_In32(0x305000);
   u32 word1 = Xil_In32(0x305004);
   if (word0 == 0) {
    Xil_Out32(0x43c00000, 0);
    Xil_Out32(0x43c00004, 0);
    u32 joy = Xil_In32(0x43c00008) | 0x1f0;
    Xil_Out32(0x43c00008, joy);
   } else {
    //u32 bit = mapUsbToC64((word0 >> 16) & 0xff);
    //bit = 1 << bit;
    u32 c64Word0 = 0;
    u32 c64Word1 = 0;
    u32 c64Word2 = 0;
    getC64Words(word0, word1, &c64Word0, &c64Word1, &c64Word2);
    c64Word2 = ~c64Word2 & 0x1f;
    c64Word2 = c64Word2 << 4;
    /*if (bit < 32) {
     c64Word0 = 1 << bit;
    } else {
     c64Word1 = 1 << (bit - 32);
    }*/

    Xil_Out32(0x43c00000, c64Word0);
    Xil_Out32(0x43c00004, c64Word1);
    u32 tempJoy = (Xil_In32(0x43c00008) & 0xf) | c64Word2;
    Xil_Out32(0x43c00008, tempJoy);
    //Xil_In32(0x305004);
   }
... 


}


Basically we start with word0 and word1, which show the usb scan codes of the gets that is currently been pressed.

If no key is pressed (e.g. word0 == 0), we just set bit 4 to 8 of address 0x43c0_0008 to ones.

The End Results

The following video shows what happens when we press the fire button when we are at the intro screen of the game Dan Dare:


It faintly resembles the game as I remember, though garbled and frozen!

What we are missing here is implementing Raster interrupts for everything to render correctly, which we will cover in the next post.

In Summary

In this post we managed to implement a joystick in C64 module by utilising the numpad on the USB keyboard.

With our Joystick we managed to transition from the Intro screen to the actual, although our emulator froze at the this point.

In the next post we will be implementing Raster interrupts so that the game screen can render properly.

Till next time!