Saturday 5 October 2019

Implementing Sprites: Part 1

Foreword

In the previous post we have implemented Raster Interrupts as well as Multicolor Text mode.

This enabled us to completely render the status bar and the Background of the game. We were even able to move between screens of the environments.

There was, however, a crucial piece of the game play experience missing: The characters were invisible!

The reason I our characters were hidden, was because we haven't implemented Sprites in our VIC-II module yet. So, our next focus in this Blog series would be to implement Sprites in C64 FPGA implementation.

To implement sprites in an upcoming C64 emulator can be quite a daunting task. The following tasks come to mind, just to name a few:
  • Coordinate memory access between fetching Sprite data, fetching screen memory content and fetching character image data.
  • Mixing the sprite images with Text /bitmap mode graphics to get the final picture
  • Adding functionality to either show sprites in front of text or behind it.
  • Dealing with transparency
  • Implementing multicolor mode
  • Adding functionality to stretch a sprite in ether the Y- or X-direction
I have therefore decided to split the implementation of sprites into a number of separate posts.

In this post we will focus on showing a single sprite in front of text.

To test the resulting Sprite implementation, we will be using a simple Basic program for moving a sprite across the screen.

Retrieving Sprite Data from Memory

Let us start our journey of implementing sprite rendering by thinking how we will be fetching sprite data from memory.

A good start will be to review how our VIC-II currently interface with memory to get image data. Here is a quick outline:
  • Output port named addr for sending required address of which we want data for.
  • Requested data is send to input port data_in. This port is 12 bits wide, eight bits data and 4 bits from Color RAM. In this way for each screen location the character code and associated color arrives simultaneously, thus eliminating the need for an extra memory cycle to get the color code.
  • Memory requests is clocked at 2MHz. This translates to 2 memory accesses during an eight pixel period.
If you went through the particulars of the VIC-II, you will see that it clocks memory accesses at 1MHz. So, why am I clocking memory at 2MHz in my VIC-II implementation?

The key to this answer lies in the fact that within a C64 memory access happens on both the rising edge and falling edge of a 1Mhz clock cycle. The VIC-II access memory on the rising edge and the 6510 CPU on the falling edge of a 1MHz clock pulse.

There is, however, cases where the VIC-II will access memory on both the rising and falling edge. This happens at the beginning of each character line, where the VIC-II needs to retrieve the character code as well as the relevant pixel data to display. The VIC-II needs that extra time to retrieve the code for the character to be displayed, so the CPU cannot do any memory accesses during these times.

As we can see memory access times is very tight for the VIC-II, so one might wonder how the VIC-II manages to get some memory cycles for retrieving spite data. This is where Christian Bauer's write-up on the VIC-II comes to the rescue, as explained here. The section of interest is 3.6.3, Timing of a raster line.

In this section a couple of VIC-II memory access diagrams is shown for a couple of scenarios. The scenario where sprites 2-7 is active on a raster line gives us a very good idea where the VIC-II rertrieves Sprite data from memory (I have added the legend for convenience):

Cycl-# 6                   1 1 1 1 1 1 1 1 1 1 |5 5 5 5 5 5 5 6 6 6 6 6 6
       5 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 |3 4 5 6 7 8 9 0 1 2 3 4 5 1
        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ΓΈ0 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ _ _ _ _ _ _ _
       __                                      |
   IRQ   ______________________________________|____________________________
                             __________________|________
    BA ______________________                  |        ____________________
                              _ _ _ _ _ _ _ _ _| _ _ _ _ _ _ _
   AEC _______________________ _ _ _ _ _ _ _ _ |_ _ _ _ _ _ _ ______________
                                               |
   VIC ss3sss4sss5sss6sss7sssr r r r r g g g g |g g g i i i i 0sss1sss2sss3s
  6510                        x x x x x x x x x| x x x x X X X
                                               |
Graph.                      |===========0102030|7383940============
                                               |
X coo. \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\
       1111111111111111111111111110000000000000|1111111111111111111111111111
       999aaaabbbbccccddddeeeeffff0000111122223|344445555666677778888889999a
       48c048c048c048c048c048c048c048c048c048c0|c048c048c048c048c04cccc04c80

  c  Access to video matrix and Color RAM (c-access)
  g  Access to character generator or bitmap (g-access)
 0-7 Reading the sprite data pointer for sprite 0-7 (p-access)
  s  Reading the sprite data (s-access)
  r  DRAM refresh
  i  Idle access

  x  Read or write access of the processor
  X  Processor may do write accesses, stops on first read (BA is low and so
     is RDY)


Let us unpack the above diagram a bit. Right on top we have cycle number, which is a count that gets incremented at a frequency of 1MHz.

The diagram starts with the last cycle of the previous line, which in this case is 65, which is obviously an NTSC VIC-II variant.

The diagram then continues from cycle#1 to cycle#19. For clarity the cycles from 20-52 is omitted in the diagram, and we continue again from cycle 53 to 65 and the first pixel of the next line.

You will also notice that after each each cycle number there is a space. This is just for the potential time period where the VIC-II might make use of both the rising and the falling edge of the 1MHz clock cycle for retrieving data.

The Graph row shows us when the electron beam is busy drawing on the screen. Making that statement felt a weird for me since we don't really use CRT's anymore :-)

On the Graph row equal(=) is the period when the Border is drawn, and a digit the when we are actually drawing character data.

With the Graph row as reference, let us have a look at the VIC row to see when sprite data is read from memory.

We can actually see that just before we are finished drawing a screen line, which is obviously a border operation, we start reading sprite info. It starts with: 0sss. We read sprite data pointer 0 from the end of screen memory, proceeded by reading three bytes of sprite data.

We do the same for sprites 1 to 7 and we stop just before we start drawing the border of the next raster line.

Also, please note that on the diagram, there is no space between the sprite pointer and sprite data data symbols. So, when we get the relevant data for a sprite, we are using all available memory cycles.

In short, the sprite data is retrieved during the none visible parts of a screen line, or to use PAL and NTSC terminology, during the front- and back-porch of a scan line.

Implementing sprite data Memory accesses

The previous section gives us a good indication where we should implement the reading of sprite data during the drawing of line.

Admitted, in our VIC-II we don't start with a blank period on a rasterline, but rather start immediately to draw the border on the beginning of each raster line.

We will, however, use the blank period of the end of the rasterline to read sprite data.

Let us start to some calculations. There are 4 memory accesses per sprite (e.g. sprite pointer and three bytes). For sprites there are two memory accesses per cycle. So, a sprite needs two cycles to get all its data.

We will be using our x_pos counter to determine when to read sprite data. For this purpose, we write the following code:

...
sprite_data_region;
...
assign sprite_data_region = (x_pos > 368 && x_pos < 496);
...

Our rasterline is 504 pixels, so we start reading sprite data at the very end of the line. Obviously the sprite pixels will be shown in the next line.

It is also more convenient to work with an offset of 368, rather than an absolute x_pos value:

...
wire [9:0] sprite_data_region_offset;
...
assign sprite_data_region_offset = {1'b0, x_pos} - 368;
...

Within this sprite data region, each sprite gets its data in a time period 16 pixels. We can therefore extract the following information from sprite_data_region_offset:

  • bits 6 - 4: sprite number
  • bits 3 - 0: Time cycle within the data cycle of current sprite. This is useful for orchestrating the various reads that should happen for the sprite. 

Next, let us focus on address generation. With this we should keep in my mind that our VIC-II module reads data from a memory port that is clocked at 2MHz. In relation to a group of 8 pixels this clock pulse at pixel number 3 and pixel number 7.

Let us change our address generation functionality as follows:

...
reg [7:0] sprite_data_location; 
wire [1:0] sprite_byte_num;
...
assign sprite_byte_num = sprite_data_region_offset[3:2] + 2'b11;
...
always @(posedge clk_in)
if (sprite_data_region && sprite_data_region_offset[3:0] == 4)
  sprite_data_location <= data_in[7:0];
...
     always @*
       if (!sprite_data_region && (clk_counter == 6 | clk_counter == 7))
         addr = bit_data_pointer;       
       else if (sprite_data_region && (sprite_data_region_offset[3:0] < 3))
         addr = {mem_pointers[7:4], 7'h7f, sprite_data_region_offset[6:4]};
       else if (sprite_data_region)
         addr = {sprite_data_location, (sprite_0_offset + sprite_byte_num)}; 
       else
         addr =  {mem_pointers[7:4], screen_mem_pos};
...

So, in each sprite section, we need to ensure we assert the address for the applicable sprite pointer before the first 2MHz clock pulse trigger. When this clock pulse trigger the Block RAM will return the value of the sprite pointer in question.

We store this sprite pointer in a register called sprite_data_location at a pixel period after the Block RAM returned this pointer.

Sprite_data_location will be used to generate addresses for the actual sprite data. We need the following pieces of information in addition to generate the addresses for the sprite data:
  • sprite offset: Line number of the sprite we need data for as a linear address. This will be a factor of three. For instance, should we need the sprite data for line 2, we will specify 6 as the offset.
  • sprite_byte_num: Either return 0,1 or 2 of the requested line.
We will be using bits 3 and 2 of sprite_data_region_offset for the sprite_byte_num. One should rather remember that we only start reading sprite data from combination 01 and not from combination 00.

During combination 00 we are still reading sprite_data_location. We therefore need to subtract one from this bit combination to get the actual bit combination.

As a quick hack, I achieved this subtraction by one by just adding 2'b11 with the help of Two's complement.

With this piece of code in place we will be receiving the sprite data for all the sprites during the applicable time frames. It is up to use to actually catch this data at the right time and manipulate it up to the point that the sprite gets rendered on the screen.

For all this functionality it makes sense to encapsulate it in a sprite_generator module, which we will cover in the next section.

The Sprite Generator module

Let us start our Sprite Generator module by providing input ports indicating the current Raster Position and Sprite position:

module sprite_generator(
  input clk_in,
  input [8:0] raster_y_pos,
  input [8:0] raster_x_pos,
  input [8:0] sprite_x_pos,
  input [7:0] sprite_y_pos,
    );


First thing we need to calculate is the linear address for the sprite line we want data for:

...
  wire [8:0] next_raster;
  wire [5:0] request_line;
...
  assign next_raster = raster_y_pos + 1;
  assign request_line = next_raster - sprite_y_pos;
  assign request_line_offset = (request_line << 1) + request_line;
...


The calculation of the linear address involves multiplying the line number by three, which we achieve by left shifting the line number by left and adding the line number to the result. 

Next, we should add a 3 byte shift register to our sprite generator that can shift the data bytes in when it arrives, as well as shifting the bits out when we are in the display region of the sprite:

module sprite_generator(
...
  input store_byte,
  input [7:0] data,
...
    );
...
  wire sprite_display_region;
  reg [23:0] sprite_data;
...
  assign sprite_display_region = (raster_y_pos >= sprite_y_pos && raster_y_pos < (sprite_y_pos + 21)) &&
                                 (raster_x_pos >= sprite_x_pos && raster_x_pos < (sprite_x_pos + 24));
...
  always @(posedge clk_in)
    if (store_byte)
      sprite_data <= {sprite_data[15:0], data[7:0]};
    else if (sprite_display_region)
      sprite_data <= {sprite_data[22:0], 1'b0};
...

So, when we are within the visible region of the sprite we shift out the contents of sprite_date one pixel at a time. What we need to next, is to output a color for each bit we shift out.

For now we will only output the color white if the bit is a 1:

module sprite_generator(
...
  output [3:0] output_pixel,
...
    );

...
assign output_pixel = sprite_data[23] ? 4'b1 : 0;
...

One thing we should keep in mind, is that bits with the value zero are in fact transparent. So, we need to have an additional output port indicating whether the current pixel should be transparent or not.

If a sprite pixel is transparent, it just a condition indicating that the VIC-II shouldn't show the current sprite pixel. There are additional conditions in which our VIC-II shouldn't show the pixel of a sprite:
  • The sprite is disabled
  • We are not current within the area of the sprite on the screen
Let us wrap all these conditions together and output to a single port:

module sprite_generator(
...
  input sprite_enabled,
  output show_pixel,
...
    );
...
  assign show_pixel = sprite_enabled && sprite_data[23] && sprite_display_region;
...

Wiring everything up

With our sprite generator created, let us hook up to our VIC-II module.

We need to end up with 8 sprite generators. That is a sprite_generator for each sprite. However, for now, to keep things simple, we will just be using a single one.

To start with, we are going to implement some more of the VIC-II registers in our VIC-II module:

reg [7:0] sprite_0_xpos;
reg [7:0] sprite_0_ypos;
reg [7:0] sprite_1_xpos;
reg [7:0] sprite_1_ypos;
reg [7:0] sprite_2_xpos;
reg [7:0] sprite_2_ypos;
reg [7:0] sprite_3_xpos;
reg [7:0] sprite_3_ypos;
reg [7:0] sprite_4_xpos;
reg [7:0] sprite_4_ypos;
reg [7:0] sprite_5_xpos;
reg [7:0] sprite_5_ypos;
reg [7:0] sprite_6_xpos;
reg [7:0] sprite_6_ypos;
reg [7:0] sprite_7_xpos;
reg [7:0] sprite_7_ypos;
reg [7:0] sprite_msb_x = 0;
reg [7:0] sprite_enabled;

always @(posedge clk_1_mhz)
     case (addr_in)
       6'h00: data_out_reg <= sprite_0_xpos;
       6'h01: data_out_reg <= sprite_0_ypos;
       6'h02: data_out_reg <= sprite_1_xpos;
       6'h03: data_out_reg <= sprite_1_ypos;
       6'h04: data_out_reg <= sprite_2_xpos;
       6'h05: data_out_reg <= sprite_2_ypos;
       6'h06: data_out_reg <= sprite_3_xpos;
       6'h07: data_out_reg <= sprite_3_ypos;
       6'h08: data_out_reg <= sprite_4_xpos;
       6'h09: data_out_reg <= sprite_4_ypos;
       6'h0a: data_out_reg <= sprite_5_xpos;
       6'h0b: data_out_reg <= sprite_5_ypos;
       6'h0c: data_out_reg <= sprite_6_xpos;
       6'h0d: data_out_reg <= sprite_6_ypos;
       6'h0e: data_out_reg <= sprite_7_xpos;
       6'h0f: data_out_reg <= sprite_7_ypos;
       6'h10: data_out_reg <= sprite_msb_x;
       6'h15: data_out_reg <= sprite_enabled;
       
       6'h20: data_out_reg <= {4'b0,border_color};
       6'h21: data_out_reg <= {4'b0,background_color};
       6'h22: data_out_reg <= {4'b0,extra_background_color_1};
       6'h23: data_out_reg <= {4'b0,extra_background_color_2};
       6'h11: data_out_reg <= {y_pos_real[8],screen_control_1[6:0]};
       6'h12: data_out_reg <= {y_pos_real[7:0]};
       6'h16: data_out_reg <= screen_control_2;
       6'h18: data_out_reg <= mem_pointers;
       6'h19: data_out_reg <= {7'h0,raster_int};
       6'h1a: data_out_reg <= int_enabled;
     endcase

always @(posedge clk_1_mhz)
begin
  if (we & addr_in == 6'h00)
    sprite_0_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h01)
    sprite_0_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h02)
     sprite_1_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h03)
     sprite_1_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h04)
     sprite_2_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h05)
     sprite_2_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h06)
     sprite_3_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h07)
     sprite_3_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h08)
     sprite_4_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h09)
     sprite_4_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0a)
     sprite_5_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0b)
     sprite_5_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0c)
     sprite_6_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0d)
     sprite_6_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0e)
     sprite_7_xpos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h0f)
     sprite_7_ypos <= reg_data_in[7:0];
  else if (we & addr_in == 6'h10)
     sprite_msb_x <= reg_data_in[7:0];
  else if (we & addr_in == 6'h15)
     sprite_enabled <= reg_data_in[7:0];

...
end




We now have enough information to link up most of the inputs ports of our sprite_generator:

sprite_generator sprite_0(
  .raster_y_pos(y_pos),
  .raster_x_pos(x_pos),
  .sprite_x_pos({sprite_msb_x[0],sprite_0_xpos}),
  .sprite_y_pos(sprite_0_ypos),
  .data(data_in[7:0]),
  .sprite_enabled(sprite_enabled[0]),
    );

We still need to connect the input port store_byte. The following snippet of code will basically generate this signal for us:

...
reg store_sprite_pixel_byte;
...
always @*
  case (sprite_data_region_offset[3:0])
    7, 11, 15:  store_sprite_pixel_byte = sprite_data_region && 1;
    default:  store_sprite_pixel_byte = 0;
  endcase
...

So, for any given sprite data phase, pixel periods 7, 11 and 15 is the periods just after sprite data bytes was retrieved from block RAM. At these times we would like to persist the data byte to our sprite_generator.

However, this code will return the data bytes for all the sprites for a rasterline, so we cannot use as is for the store_byte input port. So, when we assign to the store_byte input port, we just an extra just to make sure the data byte is meant for our sprite_generator:

sprite_generator sprite_0(
...
  .store_byte(store_sprite_pixel_byte && sprite_data_region_offset[6:4] == 0),
...
    );


All our input ports are now connected.

Let us now see how we can use the output ports of our sprite_generator to render sprites with our VIC-II module:

wire show_pixel_sprite_0;
wire [3:0] out_pixel_sprite_0;


sprite_generator sprite_0(
...
  .show_pixel(show_pixel_sprite_0),
  .output_pixel(out_pixel_sprite_0),
...
    );
...
   assign color_for_bit = multicolor_data ? multi_color :    
            (pixel_shift_reg[7] == 1 ? char_buffer_out_delayed[11:8] : background_color);
   assign color_for_bit_with_sprite = show_pixel_sprite_0 ? out_pixel_sprite_0 : color_for_bit;

   assign final_color = (visible_vert & visible_horiz & screen_enabled) ? color_for_bit_with_sprite : border_color;
...

The actual mixing of the sprite images and the main graphics happens in the wire color_for_bit_with_sprite. If our sprite_generator asserts the show_pixel output port, the sprite pixel will be shown. Otherwise we show a pixel of the main graphics.

This concludes the sprite implementation for this post.

In the following sections we will test our implementation.

Creating a Testbed for simulation

As you can see from this post, there is quite a bit of code that needs to be written just for a single sprite to be displayed.

When undertaking a task like in this post, it is always handy to have a simulation Testbed at hand, like we had created a previous post where we implemented Multicolor Bitmap mode.

The RAM image of such a Testbed contains data for a test image that enables us to test snippets of new code within seconds.

We start by getting hold of any simple program in C64 BASIC that will display a sprite for us. For this purpose I will be using a program in the C64 Users Guide, which is discussed on pages 68 - 71.

This program will show the following image as a sprite:

The program listing is as follows:


In this program we need to make two small modifications, which involves removing the clear screen statement, and using Sprite zero instead of sprite 2.

We remove the Clear Screen statement because we want actually to see that the sprite renders correctly against a background.

We will use Vice C64 emulator to run this code and to create a RAM image for our testbed. For this I will be using the same process as we used in the previous post where we have developed multicolor bitmap mode. So I will not cover the process here.

Here is quick screenshot of Vice C64 emulator executing the test BASIC program:


We have the balloon with the BASIC program as a background. We need to create a RAM image from a Vice Snapshot for our testbed.

Our testbed should then render more or less the same image.

Test Results

Here is an image from our Tesbed


The colora are a bit different from our standard C64 startup colors. This is because a Reused the testbed from a previous post where we developed the multicolor bitmap mode.

Other than that, this image looks more or less the same as the Vice screenshot in the previous section.

However, you will realise the sprite has a bit of a offset compared to the VICE emulator screenshot. Taking the fourth data element on line 220 (e.g. value 3) as reference, you will see on the VICE screenshot the balloon is situated Southeast of the '3', whereas in our testbed image the balloon appears west of the '3'.

These kind of offsets is probably expected, since our C64 FPGA implementation is by no means 100% cycle accurate compared to a real C64.

For now we will just do some offset hacks to get the sprite displayed in the correct position:

sprite_generator sprite_0(
...
  .raster_y_pos(y_pos - 5),
  .raster_x_pos(x_pos - 16),
...
    );


With these changes, let us see how this program runs on the physical FPGA:


This correlates more or less to the VICE rendering of this BASIC program.

In Summary

In this post we have started to implement sprite functionality within our C64 FPGA.

In the next couple of posts we will continue to implement sprite functionality.

Till next time!

No comments:

Post a Comment