C64 on an FPGA: Crossing from the AXI to the VGA Pixel Clock Domain

Foreword

In the previous post we played around with VGA output from the Zybo Board.

In the end we manage to get a screen filled with A's.

This is one step closer in getting the pixel output from VIC-II module displayed on a VGA enabled screen.

In a previous post we manage to write the pixel output from our VIC-II module to SDRAM. So, the next logical step would be to continuously read the pixel data back from SDRAM and displaying it on the VGA-enabled screen.

In reading back the pixel data from SDRAM and displaying on a VGA screen, we are again faced with a cross clock domain problem: The AXI port used for retrieving data from SDRAM is operating at 100MHz whereas our VGA pixel clock is clocking at around 85MHz.

In the post where we manage to wrote pixel data from VIC-II to SDRAM, we were also faced with a cross clock domain issue. This cross clock domain problem, however, was easier to solve since we could fit multiple 100MHz clock cycles on a single VIC-II clock pulse. These multiple clock cycles made it easy for us to ensure that we are more or less on the centre of a VIC-II clock pulse when sampling data for the AXI domain.

The case is not so simple in our SDRAM->VGA scenario where we have 100MHZ versus 85MHZ. In this scenario we can fit one-"and a bit" AXI clock cycles on one VGA pixel clock pulse. There is thus no easy way for us to tell when we are at the edge of a VGA pixel clock pulse.

The target of this post therefore is to see if we can find a way to solve this particular Cross Clock domain problem.

We will also test the solution to the above on the physical FPGA to see if we really got meaningful data back when it crossed from the AXI clock domain to the VGA clock domain

Some Research

I did some searching on the Internet to see how people managed to solve similar Cross Clock domain issues than what we currently have with our SDRAM->VGA.

Most resources on the web suggests that you should make use of a asynchronous FIFO-buffer. With a asynchronous FIFO-buffer you feed data with one clock frequency and read data back with a different clock frequency.

This sounds exactly what we need! But how to implement a asynchronous FIFO is another story.

It all boils down the fact that for any FIFO, whether asynchronous or not, we need two pointers: One for keeping track of the current top (e.g. the next place in memory we will write data), and another pointer for keeping track of the current bottom (e.g. the next place in memory where will read data).

The trick comes in with the fact that top pointer and bottom pointer gets updated in different clock domains, but occasionally both clock domains needs access to both pointers.

In our case, for instance, the AXI clock domain will update the top pointer every time it writes an element to the FIFO. Similarly, the VGA pixel domain clock will update the bottom pointer every time data is read from the FIFO.

Also, both the AXI and VGA pixel clock domain needs access to both pointers. The AXI clock domain needs to the read the bottom pointer so that it doesn't write passed this position causing data to be overwritten that was not read yet. Similarly, the VGA clock domain needs to able to read the top pointer to avoid reading pass valid data.

So, we are basically still having to solve a cross clock domain issue regarding the top and bottom pointers.

The first possible solution for the above issue that comes to mind is to use a two flip-flop synchronizer as described in the following web page: https://www.edn.com/electronics-blogs/day-in-the-life-of-a-chip-designer/4435339/Synchronizer-techniques-for-multi-clock-domain-SoCs

This solution, however, only works well for single bit signals. For multi-bit signals, as the top and bottom pointer, you have the risk that some of the bits might settle down before the others, thus ending off with half-baked values.

The solution many web resources poses for passing multi-bit counter values across clock domains, is to make them count using Gray code. When counting in Gray code, only a single bit changes at a time as shown below for a four bit Gray counter:

The concept sounds simple, but is still quite a mission to implement a asynchronous FIFO-buffer from scratch. So, looking around on Internet, I found an existing implementation for Asynchronous FIFO:

http://www.asic-world.com/examples/verilog/asyn_fifo.html

This implementation was written by Alex Claros F and he based it on a article Asynchronous FIFO in Virtex-II FPGAs, written by Peter Alfke.

I was about to post a copy of the above mentioned implementation in my Blog, but then it came to mind that the author of this module didn't really give explicit permission within the comments of the module for reproducing his work on another website.

However, this shouldn't stop me from thanking Alex Claros F and Peter Alfke for publicly sharing their work.

The Approach

Having found an implantation of a Asynchronous FIFO-buffer, I am curious to know if this buffer would really work within our context of sending data from the AXI domain to our VGA Pixel clock domain.

To test a Cross Clock domain implementation we will basically take the VGA module developed in the previous post, and split it into two Clock domains.

In the 100MHz clock domain we will move all the functionality responsible for pixel data generation. This pixel data we will write to the Asynchronous FIFO Buffer, at a rate of 100MHz.

Will will link up the receiving end of the Asynchronous FIFO-buffer to the VGA Pixel clock domain, reading one pixel element at a time and outputting it to the VGA connector.

One final piece of information worth mentioning is that on the AXI clock domain side we will try and keep the FIFO buffer full at all times.

Overview of the Asynchronous FIFO module

Let cover some finer details of Alex Claros's Asynchronous FIFO module.

When instantiating an instance of this model, there is two crucial parameters, DATA_WIDTH and ADRESS_WIDTH.

The default value for DATA_WIDTH is 8 bits. In our case we will need to bump this value to 16 bits because of our pixel bit size.

The default value for ADDRESS_WIDTH is 4 bits. This means a FIFO buffer size of 16 elements which will be sufficient for our case.

Let us now have a look at the ports of this module.

Firstly we have a port called Data_out for reading data and Data_in for writing data.

The reading and writing is clocked by two separate clocks RClk and WClk.

Also, we have two ports specifying whether we have something to write or want to read at a clock which are ReadEn_in and WriteEn_in.

The Clear_in buffer reset the FIFO to an empty state. We will typically use this functionality when we have just finished drawing a frame on the screen to ensure we stay in sync.

The Full_out port indicates that the FIFO buffer is full and we should abstain from writing any more data while this pin is high. We will use this port to ensure the buffer is kept full at all times.

Finally, the Empty_out indicates that the buffer is empty and reads should not be done. In our implementation we will not be using this port since we always try and keep the buffer full.

Implementing a State Machine

Our whole buffering mechanism will be driven by the Vertical Sync signal. When we reach a Vertical Sync pulse, we will clear the FIFO with the Clear_in port, and start populating the FIFO again with data starting again with the beginning of the frame.

During the course of the drawing the next frame we will try and keep the buffer full, till we encounter another Vertical Sync pulse.

To aid in this process flow we will need to implement a state machine.

We implement this state machine within our existing VGA module as follows:

...
parameter WAIT_START_VSYNC = 2'd0;
parameter RESET_CYCLE = 2'd1;
parameter GET_SET = 2'd2;
parameter WAIT_END_VSYNC = 2'd3;
...
reg [1:0] state = 2'b0;
...
always @(posedge clk_axi)
  case (state)
    WAIT_START_VSYNC: state <= vert_sync ? RESET_CYCLE : WAIT_START_VSYNC;  
    RESET_CYCLE: state <= GET_SET;
    GET_SET: state <= WAIT_END_VSYNC; 
    WAIT_END_VSYNC: state <= vert_sync ? WAIT_END_VSYNC : WAIT_START_VSYNC;    
  endcase
...

The majority of time the state machine waits for the Vertical sync signal after which it reset all state for the beginning of a new frame.

You might have noticed that I have written vert_sync in Italics. This just to remind us that this signal comes from the VGA pixel clock domain. Yes, another cross clock domain issue we should take care of!

Luckily this is a single bit signal for which we can use a double flip-flop synchronizer which we mentioned earlier.

When you do some reading on double flip-flop synchronizers, you will see that they will mention quite often that they will catch 99% of all setup and hold violations. To cater for the remaining 0.9% of setup and hold violations you should add one or more additional flip-flops to the chain. I have gone a bit overboard and added five flip-flop synchronisers:

...
reg vert_sync_delayed_1;
reg vert_sync_delayed_2;
reg vert_sync_delayed_3;
reg vert_sync_delayed_4;
reg vert_sync_delayed_5;
...
always @(posedge clk_axi)
begin
  vert_sync_delayed_1 <= vert_sync;
  vert_sync_delayed_2 <= vert_sync_delayed_1;   
  vert_sync_delayed_3 <= vert_sync_delayed_2;
  vert_sync_delayed_4 <= vert_sync_delayed_3;
  vert_sync_delayed_5 <= vert_sync_delayed_4;  
end
...
always @(posedge clk_axi)
  case (state)
    WAIT_START_VSYNC: state <= vert_sync_delayed_5 ? RESET_CYCLE : WAIT_START_VSYNC;  
    RESET_CYCLE: state <= GET_SET;
    GET_SET: state <= WAIT_END_VSYNC; 
    WAIT_END_VSYNC: state <= vert_sync_delayed_5 ? WAIT_END_VSYNC : WAIT_START_VSYNC;    
  endcase
...

Generating pixels in the AXI domain

As mentioned earlier we will move the generation of pixel data from the VGA pixel Clock domain to the AXI clock domain.

To implement this change we need to implement a horizontal/Vertical position counters that also clock in the AXI domain:

...
reg [10:0] horiz_pos_buffer = 0;
reg [10:0] vert_pos_buffer = 0;
...
always @(posedge clk_axi)
if (state == RESET_CYCLE)
begin
  horiz_pos_buffer <= 0;
  vert_pos_buffer <= 0;
end else
if (buffer_full)
begin
  //do nothing
end else
if (horiz_pos_buffer < 1359)
  horiz_pos_buffer <= horiz_pos_buffer + 1;
else begin
  horiz_pos_buffer <= 0;
  if (vert_pos_buffer < 767)
  begin
    vert_pos_buffer <= vert_pos_buffer + 1;
  end else
  begin
    vert_pos_buffer <= 0;  
  end
end
...

You will notice that we don't increment the counters when the buffer is full.

Now, let us give attention on the generation of pixel data:

...
assign pixel_in_char = horiz_pos_buffer[2:0]; 
...
always @(posedge clk_axi)
  if (buffer_full)
  begin
  end
  else
  if (pixel_in_char == 0)
  begin
  //pixel_shift_reg <= 0;
    case ({horiz_pos_buffer[3],vert_pos_buffer[2:0]})
      4'h0 : pixel_shift_reg <= 8'h18;
      4'h1 : pixel_shift_reg <= 8'h3C;
      4'h2 : pixel_shift_reg <= 8'h66;
      4'h3 : pixel_shift_reg <= 8'h7E;
      4'h4 : pixel_shift_reg <= 8'h66;
      4'h5 : pixel_shift_reg <= 8'h66;
      4'h6 : pixel_shift_reg <= 8'h66;
      4'h7 : pixel_shift_reg <= 8'h00;
      4'h8 : pixel_shift_reg <= 8'h7c;
      4'h9 : pixel_shift_reg <= 8'h66;
      4'ha : pixel_shift_reg <= 8'h66;
      4'hb : pixel_shift_reg <= 8'h7c;
      4'hc : pixel_shift_reg <= 8'h66;
      4'hd : pixel_shift_reg <= 8'h66;
      4'he : pixel_shift_reg <= 8'h7c;
      4'hf : pixel_shift_reg <= 8'h00;      
    endcase
  end    
  else
    pixel_shift_reg <= {pixel_shift_reg[6:0], 1'b0};   
...

We have use the the existing functionality of pixel_shift_reg and extend it a bit. Obviously we are now using the position counters within AXI domain (e.g. the counter variables with the suffix _buffer).

You will also notice that we don't do any operation the pixel_shift_reg if the buffer is full.

To make stuff also a bit more interesting, I will not be filling the screen only with A's this time, but with AB's.

Up to this point in time we haven't really had a look at the statement for instantiating an Asynchronous Buffer, so let us quickly have look at how it will look at the moment:

aFifo
  #(.DATA_WIDTH(16))
  my_fifo
     //Reading port
    (.Data_out(), 
     .Empty_out(),
     .ReadEn_in(),
     .RClk(clk),        
     //Writing port.  
     .Data_in(out_pixel),  
     .Full_out(buffer_full),
     .WriteEn_in(state != GET_SET),
     .WClk(clk_axi),
  
     .Clear_in(state == RESET_CYCLE));

As you see we are naming the instance my_fifo and we are overriding the DATA_WIDTH to 16 bits as explained previously.

Also, we are clearing the buffer when we are in the state RESET_CYCLE.

You will also notice that we are have enabled writing to the buffer in almost all cases except for when we are in the state GET_SET. The reason for this is because in the clock cycle directly after RESET_CYCLE, the shift register isn't initialised yet. If we have wired WriteEn_in to a '1' a value, we would have indeed written data to buffer during this clock cycle, which would have been an extra erroneous pixel.

It is for this reason why I have introduced an extra state after RESET_CYCLE, holding back the first write so that pixel_shift_reg can initialise properly. I have called this state GET_SET after the analogy of an athletics event where the athletes transition from following states: On your marks, GET SET, GO.

Reading out pixels for display

We are now ready for implementing the functionality that falls within the VGA Pixel Clock domain.

We start off by doing the necessary changes to our FIFO instance:

aFifo
  #(.DATA_WIDTH(16))
  my_fifo
     //Reading port
    (.Data_out(out_pixel_buffer), 
     .Empty_out(),
     .ReadEn_in((vert_pos < VERT_RES) & (horiz_pos < HORIZ_RES)),
     .RClk(clk),        
     //Writing port.  
     .Data_in(out_pixel),  
     .Full_out(buffer_full),
     .WriteEn_in(state != GET_SET),
     .WClk(clk_axi),
  
     .Clear_in(state == RESET_CYCLE));

First of all we only enable a read when we are currently within a visible portion on the screen. out_pixel_buffer is the pixel data we need to display.

We wire this port to the rest of our VGA Pixel Clock domain as follows:

...
assign red = out_pixel_buffer_final[15:11];
assign green = out_pixel_buffer_final[10:5];
assign blue = out_pixel_buffer_final[4:0];
...
assign out_pixel_buffer_final = (vert_pos < VERT_RES) 
                                & (horiz_pos < HORIZ_RES) ? 
                                out_pixel_buffer : 0;
...

Here again we only output pixel data from the buffer if we are within a visible portion on the screen.

The End Result

Let us have a look at the end result. I have again taken a close-up of the screen:

All the characters looks normal for me and didn't really spot any odd one out pixels because of potential Setup-and-Hold-Violations.

This is indeed a very crude check, but at least I think the Asynchronous FIFO-buffer is doing its job.

In Summary

In this post we managed to successfully split the VGA developed in the previous post into two Cross Clock domains with the help of a Asynchronous FIFO Buffer.

In the next post we will start to implement functionality for reading from SDRAM to our FPGA.

Till next time!

C64 on an FPGA

Wednesday, 25 April 2018

Crossing from the AXI to the VGA Pixel Clock Domain