C64 on an FPGA: January 2018

Monday, 22 January 2018

Designing the VIC-II core

Foreword

In the previous post we managed to write to SDRAM on the Zybo board from the FPGA.

In this post we will start to develop the VIC-II core and verify the design with a Verilog simulation.

While developing the VIC-II core in this post I refereed quite often to Christian Bauer's write-up on the VIC-II which can be accessed via the following link: http://www.zimmers.net/cbmpics/cbm/c64/vic-ii.txt

Christian Bauer did an excellent job in summarising various people's observations on the VIC-II into a single document. If planning to write a C64 emulator or other kind of implementation this doc is a most definite read.

VIC-II Memory access and the FPGA

Within a C64 system, the 6510 CPU and VIC-II can access the main memory at 1MHz without having to wait for the other.

One would think that for this dual access to main memory requires a 2 MHz clock to facilitate the required bandwidth, but the amazing thing is that only a 1MHz clock is required!

On the C64 this memory access is achieved by only allowing the 6510 memory access when the clock is high and only allowing the VIC-II access when the clock is low.

Of course the exception to the above is during bad lines in which the 6510 cannot access memory at all because the the VIC-II needs extra memory cycles to get the character codes from screen memory.

At this point the question arises on how to implement this kind of memory access within an FPGA.

For starters, Block RAM can provide data on a rising or a falling edge, but not both.

Fortunately most Block RAM types on FPGA's can provide dual read/write ports. This means that you can have two components that can access the same piece of Block RAM simultaneously, which is exactly what we want.

The situation gets a bit more complicated during bad lines where the VIC-II needs extra clock cycles to access the character codes from screen memory. So in effect our block RAM should be clocking at 2MHz during bad lines.

However, to simplify our design we should always clock our Block RAM at 2MHz, for both the 6510 Block RAM port and the VIC-II block RAM port.

Having both ports clocking at 2Mhz have the implications that the 6510 should also clock at 2MHz. To get the 6510 to clock at 1 MHz we can just mask out every second pulse of the clock signal we send to the 6510. The resulting clock signal will look as follow:

The top signal is the 2MHz and the bottom one is the 1MHz achieved by masking out every second pulse.

Ok, I must admit our 1MHZ doesn't have a 50% duty cycle as on a genuine C64, but in the end we will achieve more or less the same result 😊

Starting with the VIC-II design

Time for us to start designing a VIC-II core.

So, where do we start? Christian Bauer gives us a nice starting point in his document:

As you can see, the "Raster counter X/Y" plays a central role. This is no surprise as the complete screen display and all bus accesses are synchronized by it.

We can start off by implementing the X/Y Raster counter:

module vicii(
  input reset,
  input clk,
  );

  reg [8:0] x_pos;
  reg [8:0] y_pos;

  always @(posedge clk)
  if (reset)
  begin
    x_pos <= 0;
    y_pos <= 0;
  end
  else if (x_pos < 503)
    x_pos <= x_pos + 1;
  else
  begin
    x_pos <= 0;
    y_pos <= (y_pos < 311) ? y_pos + 1 : 0; 
  end

endmodule

In this code we are assuming a PAL implementation of the VIC-II chip, which have 312 lines and 503 pixel periods per line.

It should be noted that the input clock is the pixel clock, clocking at more or less 8MHz. This is the clock our C64 system should synchronise to. So from this signal we need to generate both a 1MHz signal and a 2Mhz signal.

To aid us in the signal generation we need to isolate the least significant 3 bits of our X counter:

...
  wire [2:0] bit_cycle;
...
  assign bit_cycle = x_pos[2:0];
...

This in effect gives us a counter that counts from 0 to 7. Apart from generating a 2MHz signal, this counter will have other uses as well, which we will see later on.

We can get our 2MHz signal by using bit 1 of bit_cycle:

module vicii(
  input reset,
  input clk,
  output clk_out,
  );
...
  wire clk_2_mhz;
...
  assign clk_2_mhz = bit_cycle[1];
  assign clk_out = clk_2_mhz;
...

The 1MHz signal we generate as follows:

module vicii(
  input reset,
  input clk,
  output clk_out,
  output clk_out_1_mhz
  );
...
  assign clk_out_1_mhz = bit_cycle > 4 ? clk_out : 0;
...

To understand this assignment we need to remember when our counter counts from 0 to 7, our 2MHz signal will produce two pulses. For our 1MHz signal we need to suppress the second pulse.

Reading Memory

Time to start to think how our VIC-II core will interface with memory.

As we will first only implement text mode in our core, we only need to worry about two types of memory accesses: Reading Screen memory and reading Character ROM.

In the coming subsection we will implement Screen memory read accesses and in the section thereafter we will implement accesses to Character ROM read accesses.

Screen Memory Reads

As mentioned earlier we don't read the Screen Memory on every screen line, but rather on every bad line. A bad line corresponds to every 8^th line in the visible character region.

Because we are reading the screen memory only every 8^th line, we need to have some kind of internal buffering. Here is the code for implementing the internal buffering:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  );
...
  wire [5:0] line_cycle_num;
  wire visible_vertical;
  wire WE;
  reg [2:0] char_line_num; 
  reg [11:0] char_buffer [39:0];
  reg [11:0] char_buffer_out;
  reg [5:0] char_buf_pos;
...
  assign line_cycle_num = x_pos[8:3];
...
  assign visible_vertical = (y_pos > 55) & (y_pos < 255) ? 1 : 0;
...
  assign WE = (line_cycle_num >= 4) & (line_cycle_num <= 43) & visible_vertical & (bit_cycle == 2)
          & (char_line_num == 0);
...

  always @(posedge clk)
  if (WE)
  begin
    char_buffer[char_buf_pos] <= data;
    char_buffer_out <= data;
  end else
    char_buffer_out <= char_buffer[char_buf_pos];
...
  always @(posedge clk)
    if (!visible_vertical)
      char_line_num <= 0;
    else if (x_pos == 384)
      char_line_num <= char_line_num + 1;
...
   always @(posedge clk)
     if (!visible_vertical)
       char_buf_pos <= 0;
     else if (bit_cycle == 0 & visible_horiz)
     begin
       if (char_buf_pos < 39)
         char_buf_pos <= char_buf_pos + 1;
       else
         char_buf_pos <= 0;
     end
...

As you can see we have added two extra ports to our core, addr and data. These ports is linked to external block memory. addr is the requested address and data will be delivered to the data port.

We will get in a moment on we determine the address.

You will notice that data port is 12 bits wide as well as each element of char_buffer. The databus and internal buffer on a real VIC-II is also 12-bits as explained in Christian Bauer's document.

The reason for the 12-bit width is that when the VIC-II reads from screen memory, data is read simultaneously from color memory from the same character position. The 8-bit screen code and 4-bit color code is then send as one 12-bit word over the databus. This combination just avoids an extra memory trip.

I have introduced a couple of reg's/wire's to make life simpler. The one is line_cycle_num which is bits 3-8 of the x_pos. This gives the cycle number within a line which can be anything between 1-63. Working with the line cycle number just makes visualising the code a bit better.

Another reg of interest is char_line_num. This register counts from 0 to 7 within the visible character region and is used to determine when we are at a bad line.

To determine if we are at a line within the visible character region we make use of visible_vertical.

With all the above defined we can define write enable (the WE wire) that determine when we should write to our internal character buffer.

This basically concludes on how the internal buffer gets populated.

What still need to discuss is how the address gets calculated. Here is the code:

...
  reg [9:0] screen_mem_pos;
...
   always @(posedge clk)
     if (!visible_vertical)
       screen_mem_pos <= 0;
     else if (bit_cycle == 0 & visible_horiz & char_line_num == 0)
       screen_mem_pos <= screen_mem_pos + 1;
...
   always @*
     if (bit_cycle == 1)
       addr = {4'b1, screen_mem_pos};
...

The screen_mem_pos counts from 0 to 999. This register is 10 bits wide. If you prepend a bit value of 1 then you get an address starting at 400 Hex.

Character ROM Reads

In the previous subsection we covered the reading of character codes from screen memory.

As a matter of fact, a character code is an index to a character image in the Character ROM. Since each Character image consists out of eight bytes, the address of the char image for a given code is calculated by simply multiplying the code by eight. We incorporate this within our addr assignment as follows:

   always @*
     if (bit_cycle == 1)
       addr = {4'b1, screen_mem_pos};
     else
       addr = {3'b010,char_buffer_out[7:0],char_line_num};

The if statement assist us with sequencing so that for the first pulse we always issue a read for the character code, and for the second pulse we issue a read to Character ROM for given code.

The address for a read from character ROM is a concatenation of three values. char_line_num is three bits, with the effect that our character code is multiplied by eight (e.g. because char_buffer_out gets shifted left three times).

Appending char_line_num ensure that we get the correct line from the screen image, that is depended on the current screen line we are on.

We need to prepend the binary value 3'b010 to the resulting address because the Character ROM is mapped at address $1000-$2000 within the VIC-II address space. Christian Bauer's doc gives more detail on this.

This concludes how the VIC-II access data from Character ROM.

Rendering Pixels

We are now ready to consider how rendering of pixels will be implemented.

The basic principle is that we copy a line of the character image within a shift register. We then shift through all the bits in the shift register, outputting the text color if the bit is a one, or output the background color if the bit is a zero.

The basic code for the pixel rendering is as follows:

...
  wire [3:0] out_color;
  wire [3:0] out_pixel;
  reg [7:0] pixel_shift_reg;
  reg [3:0] color_buffered_val;
...
  assign out_color = pixel_shift_reg[7] == 1 ? color_buffered_val : 4'd6;
  assign out_pixel = visible_vertical & visible_horiz ? out_color : 4'd14;
...
  always @(posedge clk)
  if (bit_cycle == 7)
    color_buffered_val <= char_buffer_out[11:8];
...
  always @(posedge clk)
  if (bit_cycle == 7)
    pixel_shift_reg <= data[7:0];
  else
    pixel_shift_reg <= {pixel_shift_reg[6:0],1'b0};
...

As you might have known, the data for displaying a character is fetched at the previous character position.

When we are about to display the new character line, it is important that we save the character pixels and the text color, since both values will be overridden halfway through the drawing of the character line.

The text color gets stored in color_buffered_val and the pixel data gets stored in pixel_shift_reg, our shift register.

During the rendering of a character line, we keep shifting pixel_shift_reg left by one bit at each clock cycle.

The bit we are always keeping an eye on is the most significant bit of pixel_shift_reg. If the most signficant bit is a one, we output the text color, otherwise we output the background color.

Because I haven't implemented registers yet within the VIC-II core I am just outputting a hardcoded value for the background color which is 6 (blue), which is the standard color at boot up.

out_color only returns colors within the character region of the screen. I have added the out_pixel wire to return the border color if we are writing pixels within the border region.

We are just about finished with implementing the rendering of pixels. At this point, however, we are returning pallette index values as colors with values 0 - 15.

We need to map these indexes to RGB values, so it is useful for the rest of the system. This is straightforward as follows:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  output reg [23:0] out_rgb
  );
...
   always @*
     case (out_pixel)
       4'd0: out_rgb = 24'h000000;
       4'd1: out_rgb = 24'hFFFFFF;
       4'd2: out_rgb = 24'h880000;
       4'd3: out_rgb = 24'hAAFFEE;
       4'd4: out_rgb = 24'hCC44CC;
       4'd5: out_rgb = 24'h00CC55;
       4'd6: out_rgb = 24'h0000AA;
       4'd7: out_rgb = 24'hEEEE77;
       4'd8: out_rgb = 24'hDD8855;
       4'd9: out_rgb = 24'h664400;
       4'd10: out_rgb = 24'hFF7777;
       4'd11: out_rgb = 24'h333333;
       4'd12: out_rgb = 24'h777777;
       4'd13: out_rgb = 24'hAAFF66;
       4'd14: out_rgb = 24'h0088FF;
       4'd15: out_rgb = 24'hBBBBBB;
...

Synchronisation

In the currently state of our VIC-II core we are outputting the colors for pixels, but we cannot tell which parts on the screen these pixels are mapping to.

We need some kind of synchronisation. For synchronisation we need to add the following output signals to our module:

module vicii(
  output reg [13:0] addr,
  input [11:0] data,
  input reset,
  input clk,
  output clk_out,
  output reg [23:0] out_rgb
  output wire first_pixel,
  output wire frame_sync,
  output wire blank_signal
  );

The fram_sync signal get set when the y-counter is at the Vertical blank lines towards the end of the frame. If you are populating a buffer with the pixel values, this signal will give you a chance to reset the buffer pointer to position 0 before the pixels of a new frame arrives.

The first_pixel gives the indication that pixels for the new frame has just started, and you can start populating your buffer from position zero.

The blank_signal indicates that we are on a horizontal blanking period on the line and pixel values should be ignored.

The assignment for these signals is as follows:

  assign first_pixel = (x_pos == 0) & (y_pos == 16) ? 1 : 0;  
  assign frame_sync =  y_pos > 299; 
  assign blank_signal = y_pos < 16 | y_pos > 299 | x_pos > 403 ? 1 : 0;

You will notice that only indicate the first pixel at y_pos 16. This is because the first 16 lines are also vertical blank lines.

Creating the Test Harness

Time has come to create a test harness to test our VIC-II core to see if it works as expected.

Since this Test Harness should only test the functionality of the VIC-II core, we only need to include interface to RAM and the Character ROM that the VIC-II requires. At this point we don't need to worry about implementing dual port RAMS as discussed earlier in this post.

We start by creating an instance of vicii and wiring up some of the ports:

wire [13:0] vic_addr;

vic_ii vic_inst(
  .addr(vic_addr),
  .reset(reset),
  .clk(clk),
  .clk_out(clk_out),
  );

For now I have implemented only a handful of the ports. I will connect the remaining ports during the course of this section.

Just a quick refresher on the purpose of these ports.

addr is an output port stating an address of which it needs information from.

clk is the clock input port. The frequency of this input clock is 8Mhz.

clk_out provides a scaled down clock with frequency 2Mhz for memory accesses.

Now, let us implement the different memories that the VIC requires. These are main RAM, Character ROM and Color RAM.

Here is the implementation of them:

...
wire [9:0] trunc_screen_addr;
wire [11:0] trunc_char_addr;
...
reg [7:0] char_rom [4095:0];
reg [7:0] screen_ram [1023:0];
reg [3:0] color_ram [1023:0];
reg [7:0] char_rom_out;
reg [7:0] screen_ram_out;
reg [3:0] color_ram_out;
...
assign trunc_screen_addr = vic_addr[9:0];
assign trunc_char_addr = vic_addr[11:0];
...
 always @ (posedge clk_out)
    begin
      char_rom_out <= char_rom[trunc_char_addr];
    end 
...
 always @ (posedge clk_out)
    begin
      color_ram_out <= color_ram[trunc_screen_addr];
    end 
...
 always @ (posedge clk_out)
    begin
      screen_ram_out <= screen_ram[trunc_screen_addr];
    end 
...

At this point we know that we should populate the contents of the Character ROM. However, since we are not wiring up any CPU to our Test Harness, we should also pre-populate the color RAM and screen RAM with Test data. Both of them should contain 1024 elements.

The population will again be done in a similar fashion as in previous posts:

initial begin
      $readmemh("/home/johan/Documents/roms/chargen.hex", char_rom) ;
      $readmemh("/home/johan/Documents/roms/colorram.hex", color_ram) ;
      $readmemh("/home/johan/Documents/roms/screenram.hex", screen_ram) ;
    end

Next, we should wire up these ROM and RAMS to the VIC as in the memory map for the VIC-II:

...
reg [13:0] vic_addr_delayed;
reg [7:0] combined_vic_data;
...
 always @(posedge clk_out)
   vic_addr_delayed <= vic_addr;
...
vic_ii vic_inst(
  .addr(vic_addr),
  .data({color_ram_out,combined_vic_data}),
  .reset(reset),
  .clk(clk),
  .clk_out(clk_out),
  );
...
  always @*
    casex (vic_addr_delayed) 
      14'b00_01xx_xxxx_xxxx: combined_vic_data = screen_ram_out;
      14'b01_xxxx_xxxx_xxxx: combined_vic_data = char_rom_out;
      default: combined_vic_data = 0;
    endcase
...

As also outlined in the VIC-II model described by Christiaan Bauer, we don't include Color RAM as an entry within the casex statement, but rather prepend it to combined_vic_data.

We are now ready for coding the heart of our Test Harness:

initial begin
  f = $fopen("/home/johan/out.ppm","w");
  $fwrite(f, "P3\n404 284\n255\n");
  #50 reset <=0;
  #90000;
  while (first_pixel == 0) begin
    @(negedge clk);    
  end
  while (!frame_sync)
  begin
    if (!blank_signal)
      $fwrite(f, "%d %d %d\n", red, green, blue);
    @(negedge clk);
  end
  $fclose(f);
  #3000000 $finish;
end

The basic idea is that we create a image from the pixel output. We start this off by opening up a new image file for writing. The format of this file is a portable pixmap format (ppm). With this file format the pixel values gets written as plain text.

The first thing we write to the file is the header.

P3 means each pixel will have three values (e.g. rgb).
255 means that each value can have a max value of 255.
404 284 means that the image will have a pixel width of 404 and a pixel height of 284

As part of our test of the VIC-II core we want to test that the synchronise functionality works correctly. So, we need to start the test at a random pixel within a frame. This is the purpose of the #90000 we have added.

When then wait till the first _pixel signal is asserted. Interesting use here is the use of the at(@). We usually only use this kind of statement within an always block. We could, however, use this statement within a initial block as well.

We do the check on first _pixel each time on the negative edge of the clock cycle.

When we have finally got an asserted first_pixel, we start writing the pixels to our ppm file. We also also advance to the next value on the negative edge.

Within the pixel writing loop we check for two thing. The first thing is not to write a pixel value if blank_signal is asserted. The second thing to check for is to only loop till the frame_sync signal is asserted.

With the frame_sync signal asserted we can assume the frame is finished, and we can stop our test harness.

Test Result

For my test screen data I have just repeated the screen codes 0 to 4 and ended off the test data with a couple of screen code 0's.

The result for running the Test Harness with these test data is ass follows:

Our core is behaving more or less as expected.

In Summary

In this post we have developed the VIC-II core and tested it with a Test Harness.

In the next post we will integrate this VIC-II core with the rest of our c64 core.

Till next time!

Thursday, 11 January 2018

Full Block design for accessing SDRAM

Foreword

In the previous post we started exploring how to interface our FPGA to the SDRAM on the ZYBO board.

We ended off the post with writing the User Logic for interfacing with IPIC.

In this post we will continue creating a workable block design and see if we can write to SDRAM from FPGA.

In this post we will spend a lot of time with Vivado IDE specifics, so please bear with me :-)

Creating an AXI Interface Block

Time for us to create an AXI Interface Block.

Start off by creating a new Vivado Project, also choosing the Zybo as the target board.

The AXI Interface block we will also create as an IP, so you will also start off by selecting Tools/Create and Package New IP. In the screen where you need to select the applicable task, you will need to select a different task than what we did in previous posts:

This option will create an AXI Pheripheral for us.

On the next page of the wizard you will be given the option to customise your AXI peripheral. By default a AXI slave will be created. We need to change this so that a AXI Master peripheral gets created. With that change performed the wizard page will look as follows:

On the next wizard page click Edit IP and then Finish.

Let us now have a look at the generated source code:

`timescale 1 ns / 1 ps

 module myip_burst_test_v1_0 #
 (
  // Users to add parameters here

  // User parameters ends
  // Do not modify the parameters beyond this line

  // Parameters of Axi Master Bus Interface M00_AXI
  parameter  C_M00_AXI_TARGET_SLAVE_BASE_ADDR = 32'h40000000,
  parameter integer C_M00_AXI_BURST_LEN = 16,
  parameter integer C_M00_AXI_ID_WIDTH = 1,
  parameter integer C_M00_AXI_ADDR_WIDTH = 32,
  parameter integer C_M00_AXI_DATA_WIDTH = 32,
  parameter integer C_M00_AXI_AWUSER_WIDTH = 0,
  parameter integer C_M00_AXI_ARUSER_WIDTH = 0,
  parameter integer C_M00_AXI_WUSER_WIDTH = 0,
  parameter integer C_M00_AXI_RUSER_WIDTH = 0,
  parameter integer C_M00_AXI_BUSER_WIDTH = 0
 )
 (
  // Users to add ports here
  // User ports ends
  // Do not modify the ports beyond this line

  // Ports of Axi Master Bus Interface M00_AXI
  input wire  m00_axi_init_axi_txn,
  output wire  m00_axi_txn_done,
  output wire  m00_axi_error,
  input wire  m00_axi_aclk,
  input wire  m00_axi_aresetn,
  output wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_awid,
  output wire [C_M00_AXI_ADDR_WIDTH-1 : 0] m00_axi_awaddr,
  output wire [7 : 0] m00_axi_awlen,
  output wire [2 : 0] m00_axi_awsize,
  output wire [1 : 0] m00_axi_awburst,
  output wire  m00_axi_awlock,
  output wire [3 : 0] m00_axi_awcache,
  output wire [2 : 0] m00_axi_awprot,
  output wire [3 : 0] m00_axi_awqos,
  output wire [C_M00_AXI_AWUSER_WIDTH-1 : 0] m00_axi_awuser,
  output wire  m00_axi_awvalid,
  input wire  m00_axi_awready,
  output wire [C_M00_AXI_DATA_WIDTH-1 : 0] m00_axi_wdata,
  output wire [C_M00_AXI_DATA_WIDTH/8-1 : 0] m00_axi_wstrb,
  output wire  m00_axi_wlast,
  output wire [C_M00_AXI_WUSER_WIDTH-1 : 0] m00_axi_wuser,
  output wire  m00_axi_wvalid,
  input wire  m00_axi_wready,
  input wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_bid,
  input wire [1 : 0] m00_axi_bresp,
  input wire [C_M00_AXI_BUSER_WIDTH-1 : 0] m00_axi_buser,
  input wire  m00_axi_bvalid,
  output wire  m00_axi_bready,
  output wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_arid,
  output wire [C_M00_AXI_ADDR_WIDTH-1 : 0] m00_axi_araddr,
  output wire [7 : 0] m00_axi_arlen,
  output wire [2 : 0] m00_axi_arsize,
  output wire [1 : 0] m00_axi_arburst,
  output wire  m00_axi_arlock,
  output wire [3 : 0] m00_axi_arcache,
  output wire [2 : 0] m00_axi_arprot,
  output wire [3 : 0] m00_axi_arqos,
  output wire [C_M00_AXI_ARUSER_WIDTH-1 : 0] m00_axi_aruser,
  output wire  m00_axi_arvalid,
  input wire  m00_axi_arready,
  input wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_rid,
  input wire [C_M00_AXI_DATA_WIDTH-1 : 0] m00_axi_rdata,
  input wire [1 : 0] m00_axi_rresp,
  input wire  m00_axi_rlast,
  input wire [C_M00_AXI_RUSER_WIDTH-1 : 0] m00_axi_ruser,
  input wire  m00_axi_rvalid,
  output wire  m00_axi_rready
 );

    
// Instantiation of Axi Bus Interface M00_AXI
 myip_burst_test_v1_0_M00_AXI # ( 
  .C_M_TARGET_SLAVE_BASE_ADDR(C_M00_AXI_TARGET_SLAVE_BASE_ADDR),
  .C_M_AXI_BURST_LEN(C_M00_AXI_BURST_LEN),
  .C_M_AXI_ID_WIDTH(C_M00_AXI_ID_WIDTH),
  .C_M_AXI_ADDR_WIDTH(C_M00_AXI_ADDR_WIDTH),
  .C_M_AXI_DATA_WIDTH(C_M00_AXI_DATA_WIDTH),
  .C_M_AXI_AWUSER_WIDTH(C_M00_AXI_AWUSER_WIDTH),
  .C_M_AXI_ARUSER_WIDTH(C_M00_AXI_ARUSER_WIDTH),
  .C_M_AXI_WUSER_WIDTH(C_M00_AXI_WUSER_WIDTH),
  .C_M_AXI_RUSER_WIDTH(C_M00_AXI_RUSER_WIDTH),
  .C_M_AXI_BUSER_WIDTH(C_M00_AXI_BUSER_WIDTH)
 ) myip_burst_test_v1_0_M00_AXI_inst (
  .INIT_AXI_TXN(m00_axi_init_axi_txn),
  .TXN_DONE(m00_axi_txn_done),
  .ERROR(m00_axi_error),
  .M_AXI_ACLK(m00_axi_aclk),
  .M_AXI_ARESETN(m00_axi_aresetn),
  .M_AXI_AWID(m00_axi_awid),
  .M_AXI_AWADDR(m00_axi_awaddr),
  .M_AXI_AWLEN(m00_axi_awlen),
  .M_AXI_AWSIZE(m00_axi_awsize),
  .M_AXI_AWBURST(m00_axi_awburst),
  .M_AXI_AWLOCK(m00_axi_awlock),
  .M_AXI_AWCACHE(m00_axi_awcache),
  .M_AXI_AWPROT(m00_axi_awprot),
  .M_AXI_AWQOS(m00_axi_awqos),
  .M_AXI_AWUSER(m00_axi_awuser),
  .M_AXI_AWVALID(m00_axi_awvalid),
  .M_AXI_AWREADY(m00_axi_awready),
  .M_AXI_WDATA(m00_axi_wdata),
  .M_AXI_WSTRB(m00_axi_wstrb),
  .M_AXI_WLAST(m00_axi_wlast),
  .M_AXI_WUSER(m00_axi_wuser),
  .M_AXI_WVALID(m00_axi_wvalid),
  .M_AXI_WREADY(m00_axi_wready),
  .M_AXI_BID(m00_axi_bid),
  .M_AXI_BRESP(m00_axi_bresp),
  .M_AXI_BUSER(m00_axi_buser),
  .M_AXI_BVALID(m00_axi_bvalid),
  .M_AXI_BREADY(m00_axi_bready),
  .M_AXI_ARID(m00_axi_arid),
  .M_AXI_ARADDR(m00_axi_araddr),
  .M_AXI_ARLEN(m00_axi_arlen),
  .M_AXI_ARSIZE(m00_axi_arsize),
  .M_AXI_ARBURST(m00_axi_arburst),
  .M_AXI_ARLOCK(m00_axi_arlock),
  .M_AXI_ARCACHE(m00_axi_arcache),
  .M_AXI_ARPROT(m00_axi_arprot),
  .M_AXI_ARQOS(m00_axi_arqos),
  .M_AXI_ARUSER(m00_axi_aruser),
  .M_AXI_ARVALID(m00_axi_arvalid),
  .M_AXI_ARREADY(m00_axi_arready),
  .M_AXI_RID(m00_axi_rid),
  .M_AXI_RDATA(m00_axi_rdata),
  .M_AXI_RRESP(m00_axi_rresp),
  .M_AXI_RLAST(m00_axi_rlast),
  .M_AXI_RUSER(m00_axi_ruser),
  .M_AXI_RVALID(m00_axi_rvalid),
  .M_AXI_RREADY(m00_axi_rready)
 );




 // Add user logic here

 // User logic ends

 endmodule

In the ports section you will see all the AXI Master signals defined. In the code it also provides the user a couple of places to add some custom code or ports.

Towards the end you will see an instantiation of the AXI bus interface with the instance name myip_burst_test_v1_0_M00_AXI_inst. We will need change this instance to an instance of the AXI Master Burst Core, which we covered in the previous post.

It should be noted that by default the AXI Master Burst Core is not available in Vivado to use in your design. There is a couple of hoops you need to jump through to add it to your design, so let us quickly jump through them :-)

Firstly you need to locate the source file for AXI Master Burst Core within your Vivado installation. In my installation it is located in the folder Vivado/2017.1/data/ip/xilinx. The folder 2017.1 will be different in your case. It should be the version number of Vivado you are using.

Now, within the xilinx folder you will see folders for different IP's. The folder we are interested in is axi_master_burst_v2_0. Moving into this folder you will see three folders: doc, hdl and xgui.

When you move into the hdl folder you will see a file called axi_master_burst_v2_0_vh_rfs.vhd and this is the file we are after. A vhd file is also a Hardware Description language file as Verilog is one. Vivado can work with both formats.

You now need to add the file as a design source to your current IP Project.

The vhd file you have added have some additional dependencies that you need to add, which can also be located in the xilinx folder. You need to add the vhd files for these dependencies in a similar way: lib_pkg_v1_0 and lib_srl_fifo_v1_0.

With all dependencies added let us see if axi_master_burst_v2_0_vh_rfs.vhd open up without any errors. This time around we are not lucky:

It complains about a module not found in library lib_srl_fifo, with a hint that the library might not exist. This is despite the fact that we added lib_srl_fifo_v1_0 to our project.

What is going on? The key to this question is the library names for three files. To see the library name for these files click on the Libraries tab within the Sources Panel:

As seen from the diagram all three vhd files with a library name of xil_defaultlib. We need to change the library name for lib_srl_fifo. So click on this file and click on the Ellipse of the library attribute within the Source File Panel. Change the library name ass follows:

The other vhd files should changed in a similar way. The resulting library tab will look as follows:

If you now close down all the source edit panels for these vhd files and reopen them, no errors should be reported.

We are now ready to change our IP to use an instance of AXI Master Burst. The final Verilog file will look as follows:

`timescale 1 ns / 1 ps

 module myip_burst_test_v1_0 #
 (
  // Users to add parameters here

  // User parameters ends
  // Do not modify the parameters beyond this line

  // Parameters of Axi Master Bus Interface M00_AXI
  parameter  C_M00_AXI_TARGET_SLAVE_BASE_ADDR = 32'h40000000,
  parameter integer C_M00_AXI_BURST_LEN = 16,
  parameter integer C_M00_AXI_ID_WIDTH = 1,
  parameter integer C_M00_AXI_ADDR_WIDTH = 32,
  parameter integer C_M00_AXI_DATA_WIDTH = 32,
  parameter integer C_M00_AXI_AWUSER_WIDTH = 0,
  parameter integer C_M00_AXI_ARUSER_WIDTH = 0,
  parameter integer C_M00_AXI_WUSER_WIDTH = 0,
  parameter integer C_M00_AXI_RUSER_WIDTH = 0,
  parameter integer C_M00_AXI_BUSER_WIDTH = 0
 )
 (
  // Users to add ports here
        input wire [31:0] ip2bus_mst_addr,
        input wire [11:0] ip2bus_mst_length,
        input wire [31:0] ip2bus_mstwr_d,
        input wire [4:0] ip2bus_inputs,
        output wire [5:0] ip2bus_otputs,
  // User ports ends
  // Do not modify the ports beyond this line

  // Ports of Axi Master Bus Interface M00_AXI
  input wire  m00_axi_init_axi_txn,
  output wire  m00_axi_txn_done,
  output wire  m00_axi_error,
  input wire  m00_axi_aclk,
  input wire  m00_axi_aresetn,
  output wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_awid,
  output wire [C_M00_AXI_ADDR_WIDTH-1 : 0] m00_axi_awaddr,
  output wire [7 : 0] m00_axi_awlen,
  output wire [2 : 0] m00_axi_awsize,
  output wire [1 : 0] m00_axi_awburst,
  output wire  m00_axi_awlock,
  output wire [3 : 0] m00_axi_awcache,
  output wire [2 : 0] m00_axi_awprot,
  output wire [3 : 0] m00_axi_awqos,
  output wire [C_M00_AXI_AWUSER_WIDTH-1 : 0] m00_axi_awuser,
  output wire  m00_axi_awvalid,
  input wire  m00_axi_awready,
  output wire [C_M00_AXI_DATA_WIDTH-1 : 0] m00_axi_wdata,
  output wire [C_M00_AXI_DATA_WIDTH/8-1 : 0] m00_axi_wstrb,
  output wire  m00_axi_wlast,
  output wire [C_M00_AXI_WUSER_WIDTH-1 : 0] m00_axi_wuser,
  output wire  m00_axi_wvalid,
  input wire  m00_axi_wready,
  input wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_bid,
  input wire [1 : 0] m00_axi_bresp,
  input wire [C_M00_AXI_BUSER_WIDTH-1 : 0] m00_axi_buser,
  input wire  m00_axi_bvalid,
  output wire  m00_axi_bready,
  output wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_arid,
  output wire [C_M00_AXI_ADDR_WIDTH-1 : 0] m00_axi_araddr,
  output wire [7 : 0] m00_axi_arlen,
  output wire [2 : 0] m00_axi_arsize,
  output wire [1 : 0] m00_axi_arburst,
  output wire  m00_axi_arlock,
  output wire [3 : 0] m00_axi_arcache,
  output wire [2 : 0] m00_axi_arprot,
  output wire [3 : 0] m00_axi_arqos,
  output wire [C_M00_AXI_ARUSER_WIDTH-1 : 0] m00_axi_aruser,
  output wire  m00_axi_arvalid,
  input wire  m00_axi_arready,
  input wire [C_M00_AXI_ID_WIDTH-1 : 0] m00_axi_rid,
  input wire [C_M00_AXI_DATA_WIDTH-1 : 0] m00_axi_rdata,
  input wire [1 : 0] m00_axi_rresp,
  input wire  m00_axi_rlast,
  input wire [C_M00_AXI_RUSER_WIDTH-1 : 0] m00_axi_ruser,
  input wire  m00_axi_rvalid,
  output wire  m00_axi_rready
 );

    //inputs
    wire ip2bus_mstwr_req;
    wire ip2bus_mst_type;
    
    wire ip2bus_mstwr_sof_n;
    wire ip2bus_mstwr_src_rdy_n;
    //outputs
    wire bus2ip_mst_cmdack;
    wire bus2ip_mst_cmplt;
    wire bus2ip_mst_error;
    wire ip2bus_mstwr_eof_n;
    wire bus2ip_mstwr_dst_dsc_n;
    wire bus2ip_mstwr_dst_rdy_n;
    wire md_error;
    
    

axi_master_burst myip_burst_test_v1_0_M00_AXI_inst
  (

    .m_axi_aclk(m00_axi_aclk),

    .m_axi_aresetn(m00_axi_aresetn),

    .md_error(md_error),

    .m_axi_arready(m00_axi_arready),
    .m_axi_arvalid(m00_axi_arvalid),
    .m_axi_araddr(m00_axi_araddr),
    .m_axi_arlen(m00_axi_arlen),
    .m_axi_arsize(m00_axi_arsize),
    .m_axi_arburst(m00_axi_arburst),
    .m_axi_arprot(m00_axi_arprot),
    .m_axi_arcache(m00_axi_arcache),
    .m_axi_rready(m00_axi_rready),
    .m_axi_rvalid(m00_axi_rvalid),
    .m_axi_rdata(m00_axi_rdata),
    .m_axi_rresp(m00_axi_rresp),
    .m_axi_rlast(m00_axi_rlast),

    .m_axi_awready(m00_axi_awready),
    .m_axi_awvalid(m00_axi_awvalid),
    .m_axi_awaddr(m00_axi_awaddr),
    .m_axi_awlen(m00_axi_awlen),
    .m_axi_awsize(m00_axi_awsize),
    .m_axi_awburst(m00_axi_awburst),
    .m_axi_awprot(m00_axi_awprot),
    .m_axi_awcache(m00_axi_awcache),
    .m_axi_wready(m00_axi_wready),
    .m_axi_wvalid(m00_axi_wvalid),
    .m_axi_wdata(m00_axi_wdata),
               
    .m_axi_wstrb(m00_axi_wstrb),
    .m_axi_wlast(m00_axi_wlast),
    .m_axi_bready(m00_axi_bready),
    .m_axi_bvalid(m00_axi_bvalid),
    .m_axi_bresp(m00_axi_bresp),

    .ip2bus_mstrd_req(1'b0), 
    .ip2bus_mstwr_req(ip2bus_mstwr_req), 
    .ip2bus_mst_addr(ip2bus_mst_addr),
    .ip2bus_mst_length(ip2bus_mst_length),
    .ip2bus_mst_type(ip2bus_mst_type),
    .ip2bus_mst_lock(1'b0),
    .ip2bus_mst_reset(1'b0),

    .bus2ip_mst_cmdack(bus2ip_mst_cmdack),
    .bus2ip_mst_cmplt(bus2ip_mst_cmplt),
    .bus2ip_mst_error(bus2ip_mst_error),

    .ip2bus_mstwr_d(ip2bus_mstwr_d),
    .ip2bus_mstwr_rem(4'h0),
    .ip2bus_mstwr_sof_n(ip2bus_mstwr_sof_n),
    .ip2bus_mstwr_eof_n(ip2bus_mstwr_eof_n),
    .ip2bus_mstwr_src_rdy_n(ip2bus_mstwr_src_rdy_n),
    .ip2bus_mstwr_src_dsc_n(1'b1),

    .bus2ip_mstwr_dst_rdy_n(bus2ip_mstwr_dst_rdy_n),
    .bus2ip_mstwr_dst_dsc_n(bus2ip_mstwr_dst_dsc_n)

    );



 // Add user logic here
 
 

    //inputs
    assign ip2bus_mstwr_req = ip2bus_inputs[0];
    assign ip2bus_mst_type = ip2bus_inputs[1];
    
    assign ip2bus_mstwr_sof_n = ip2bus_inputs[2];
    assign ip2bus_mstwr_eof_n = ip2bus_otputs[3];
    assign ip2bus_mstwr_src_rdy_n = ip2bus_inputs[4];
    //outputs
    assign ip2bus_otputs[0] = bus2ip_mst_cmdack;
    assign ip2bus_otputs[1] = bus2ip_mst_cmplt;
    assign ip2bus_otputs[2] = bus2ip_mst_error;
    
    assign ip2bus_otputs[3] = bus2ip_mstwr_dst_dsc_n;
    assign ip2bus_otputs[4] = bus2ip_mstwr_dst_rdy_n;
    assign ip2bus_otputs[5] = md_error;
    assign m00_axi_aruser[0] = 1'b0;
    assign m00_axi_awuser[0] = 1'b1;
    assign m00_axi_arlock = 1'b0;
    assign m00_axi_awlock = 1'b0;
    assign m00_axi_arqos = 15;
    assign m00_axi_awqos = 15;
    assign m00_axi_arid[0] = 0;
 // User logic ends

 endmodule

You will see that we hooked up most of the Master AXI ports to our Burst instance and some are unconnected.

Some AXI Master ports are required, but not present on the AXI Master Burst instance. For these ports we have hooked up to constant values.

You will also notice the ip2bus_inputs and ip2bus_outputs ports, similar that we defined for a module we defined in the previous post.

With all these changes performed you can now package your AXI enabled IP.

Hooking everything up

With the IP we have developed in the previous post and the AXI enabled IP we developed in the previous section, we are now ready to create the full block design.

Since we need a ZYNQ Processing System block to access the SDRAM, we will start off adding this Block together with auto connection.

Next, we need to add the two IP's we have created in the previous section and in the previous post.

With these changes done, your block diagram should more or less like the following:

One thing that is missing is an AXI slave port on the ZYNQ Block to connect he AXI Master port of our Burst Block. Slave ports on the ZYNQ need to be enabled via ZYNQ IP Customization.

So, double click on the ZYNQ block and then on PS-PL Configuration in the left panel. In this screenshot I have highlighted the type of slave ports available on the ZYNQ:

These Salve ports are covered in detail in the ZYNQ Technical Reference Manual, but let us try to summarise the purpose of each:

General Purpose (GP) Slave interface: Only for general purpose use and not intended to achieve high performance
High Performance (HP) Slave Interface: Provides high bandwidth datapaths to the DDR
Accelerator Coherency port (ACP) Slave Interface: Low latency access to Masters, with optional coherency with L1 and L2 cache.

The question now is which AXI Slave port should we enable? We should keep in mind that in the beginning we will inspect memory most of the time with the XSCT console within the Xilinx SDK. The XSCT always see data from the same perspective of the active CPU core via the L1 cache.

With the XSCT console in mind, the ACP slave port will fit our needs perfectly. With the ACP port we don't need to worry about seeing stale L1 cache values, since if we write data to DDR via ACP, our L1 cache will contain also contain the written data. A Snoop controller (https://en.wikipedia.org/wiki/Bus_snooping) takes care of this in the background.

Let us enable the ACP Slave port as follows:

You will see that our ZYNQ Block now have a Slave port as well as Slave Clock pin, all on the left:

Time again to do some connection automation for the AXI Slave port on the ZYNQ. The result will look something like the following:

You will see that the Connection Automation introduced a new Block, an AXI SmartConnect.

Next we need to wire up all the ip2bus wires:

Everything looks connected now, except for the write and write_data ports of burst_block_0. We need to create an IP block that can generate test data feeding these wires.

The verilog code for this new IP will look like the following:

module test_data_gen(
  input wire reset,
  input wire clk,
  output wire write,
  output wire [31:0] data_out
    );
    
  parameter
      INITIALISED = 4'h0,
      STARTED = 4'h1,
      STOPPED = 4'h2;

  reg [39:0] init_counter;
  reg [31:0] data_counter;
  reg [2:0] state;
  
  always @(posedge clk)
    if (!reset)
      state <= INITIALISED;
    else
      case(state)
        INITIALISED: if (init_counter > 40'd20000000000)
          state <= STARTED;
        STARTED: if (data_counter > 20)
          state <= STOPPED;
       endcase
       
  always @(posedge clk)
    if (!reset)
      init_counter <= 0;
    else
      init_counter <= init_counter + 1;
      
  always @(posedge clk)
    if (!reset)
      data_counter <= 0;
    else if(state == STARTED)
      data_counter <= data_counter + 1;
      
   assign write = state == STARTED ? 1 : 0;
   assign data_out = data_counter;
             
endmodule

Nothing more than a state machine waiting some time before sending data. Assuming a clock speed of 100MHZ, this block will wait about 3 minutes before sending data. This give you enough time to get everything in the Xilinx SDK up and running.

Add the New IP to you rblock design and link up with the write and write_data ports.

You are now ready to Run Synthesis and then Generating the Bitstream.

After generating the BitStream remember to Export the Hardware. You can then launch the Xilinx SDK.

Testing

We are now ready to test our design within the Xilinx SDK.

For our testing we just need a plain HelloWorld App. The app is just required to get one of the CPU cores in a runnable state so that we can use the XCST console.

Ensure that the FPGA is programmed and fire off the HelloWorld App in Debug Mode.

Wait till the first breakpoint fires and switch to the XCT Console. At this point your SDK window will look something like the following:

CPU Core#0 is at a breakpoint within the Main function.

Within above screenshot I have also marked two important areas. The first one is where you can locate the XCST console tab.

The second area is the actual XCST console where you will be entering the command.

At this point in time there would be still a number of minutes left before our FPGA core will write some information to SDRAM.

Our FPGA will be writing information to memory location 0. So this is the location to monitor with the XSCT console. To see the current contents of this memory location enter the following command in the XSCT console:

mrd 0

As you might have guest, mrd stands for memory read.

The output of this command will look something like the following:

It is expected to see some initial random number. After about three and a half minutes our FPGA core should have written the values 0 to 4 in the first four memory locations.

Let us run the mrd command again, but this time let it return more values by specifying it as follows:

mrd 0 8

This variant of the command states that the contents of eight consecutive memory locations should be returned starting at location 0.

The output of the command should look as follows:

This shows that our FPGA core actually managed to write successfully to SDRAM!

A final note on ACP and Caching

Before ending off this section, I just would like to mention something interesting regarding ACP and Caching.

If you look at section 3.5.1 of the ZYNQ technical reference manual, you see the following:

ACP coherent write requests: An ACP write request is coherent when AWUSER[0] = 1 and AWCACHE[1] =1 alongside AWVALID. In this case, the SCU enforces coherency. When the data is present in one of the Cortex-A9 processors, the data is first cleaned and invalidated from the relevant CPU. When the data is not present in any of the Cortex-A9 processors, or when it has been cleaned and invalidated, the write request is issued on one of the SCU AXI master ports, along with all corresponding AXI parameters with the exception of the locked attribute.

Our design has enabled ACP coherent writes: The AXI Master Burst core always set AWCACHE[1] = 1 and within our AXI enabled IP we set AWUSER[0] high with this assignment:

assign m00_axi_awuser[0] = 1'b1;

Assigning a zero this signal will effectively disable ACP coherent writes.

If we ran the steps in the previous section with the user signal set to zero, the L1 cache would cling to the value 9FA90DDC upon the first read and wouldn't pick up the changes our FPGA core made to that memory location.

So, be careful of caching!

In Summary

In this post we completed the Block Design for testing writing to SDRAM from an FPGA core.

We also confirmed via the XSCT console that the write was indeed successful.

In the next post we will start to develop a VIC-II core.

Till next time!

Sunday, 7 January 2018

Accessing SDRAM

Foreword

In the previous post we managed to boot the C64 system on the Zybo FPGA.

In this post we will explore how to interface the FPGA with the SDRAM on the Zybo board.

Having this knowledge will help us in future posts getting around the limitations of limited Block RAMS on the FPGA when doing tasks like resizing video frames to fit on an LCD screen.

An overview of AMBA/AXI

You might recall the following diagram from my Introduction Post:

This diagram summarises the components on the ZYNQ chip.

If you have a close look at the above diagram, you will see that the FPGA block (indicated in yellow) doesn't have a direct connection to the DDR controller. Should your FPGA design wish to transact with SDRAM, it will need to do so via AMBA Interconnect.

One might wonder why the AMBA Interconnect step is necessary. The answer is that there is a couple of peripherals on the ZYNQ (like USB and Gigabit Ethernet). Each of these peripherals can potentially compete for memory access.

This is where AMBA Interconnect comes in. AMBA Interconnect manages multiple memory accesses in an effecient and fair way.

AMBA Interconnect receives memory requests via AXI Ports. An AXI Port consists of an AXI Master and a AXI Slave. The AXI Master initiates a memory transaction (like a memory read or write request). The AXI slave is responsible for fulfulling the AXI Master request.

The finer detail of AXI Master and slaves will become clear during the course of this post.

Using the AXI Master Burst Core

Creating an AXI Master-capable IP can be quite a daunting task, especially if it is your first take on it.

To simplify the task somewhat Vivado provides the AXI Master Burst Core.

If you go through the Documentation a bit, you will see that yout User Logic that requires access to the AXI Bus sends read/write requests via the IPIC protocol to the AXI Master Burst Block. The AXI Master Burst Block in turn converts the requests to AXI Master requests and send it to a AXI Slave port.

IPIC is the acronym for Intellectual Property InteConnect and is a Xilinx Standard. Compared to the AXI Master protocol IPIC is a simplified protocol.

Let us now have a look at how a typical IPIC communication session looks like between Use Logic and the AXI Master Burst Block. There is some nice timing diagrams in the Documentation illustration a couple of scenarios, from page 25 - 32.

As you see, all operations is clocked by the axi master clock, which, on the Zynq on the ZYBO, is 100Mhz.

The wite operation is started by issuing a command from the User logic on the IPIC bus. There are four important signals that together forms the command.

Firstly, because this is a write, the mstwr_req line gets asserted.

Next, as part of the command mst_type also gets asserted. According to the documentation, the meaning of an asserted msty_type line is Fixed length burst. A pulled down mst_type line has the meaning of Single Data Beat. With our command we want to send multiple Data Beats, so asserting mst_type line is the way to go.

The other two signals for our command is mst_addr, indicating the destination address in memory, as well as mst_length. It is important to note that the size of mst_length is in bytes, although we are sending 32 bits of data at a time.

As soon as our AXI master burst block have accepted our command, it signals us via mst_cmdack.

Once we received above confirmation we need to keep an eye on the signal MstWr_dst_rdy_n. At each clock cycle, if this signal is low, we need to give the next piece of data via MstWr_d.

Another rdy signal we should be aware of is MstWr_src_rdy_n (aka source ready). Should we be in situation where we don't have the next piece of data available yet at the transition to the next clock cycle, we should ensure that the source ready signal is set to high.

Two final signals worth noting is MstWr_sof_n and MstWr_eof_n. The first signal, start of frame, you will pull low when you are at the point of sending your first piece of data. You are about to send your last piece of data, you will pull the MstWr_eof_n (aka end of frame) low.

Developing the user logic

With the Burst IP block discussed in the previous section, it would be nice to write some Test Verilog code and see if we can populate the SDRAM on the ZYBO board.

In this section we will start writing the IPIC interface code for the user logic and do the full block design in the next post.

I will start off by showing the full User Logic Verilog solution for the IPIC interface:

module burst_block(
  input wire clk,
  input wire reset,
  input wire write,
  input wire [31:0] write_data, 
  output wire [31:0] ip2bus_mst_addr,
  output wire [11:0] ip2bus_mst_length,
  output wire [31:0] ip2bus_mstwr_d,
  output wire [4:0] ip2bus_inputs,
  input wire [5:0] ip2bus_otputs

    );
    
    wire master_write_dst_rdy; //change to axi name
    wire cmd_ack; // change to axi name
    wire mstwrite_req;
    wire mst_type;
    reg  [31:0] axi_start_address;
    wire [11:0] burst_len;
    wire [31:0] axi_d_out;
    //output rem 
    wire sof;
    wire eof;
    reg master_write_src_rdy;

reg [12:0] bytes_to_send;
reg [12:0] count_in_buf;
reg [3:0] state;
wire read;

reg [12:0] axi_data_inc;
wire neg_clk;

parameter
  IDLE = 4'h0,
  INIT_CMD = 4'h1,
  START = 4'h2,
  ACT = 4'h3,
  TRANSMITTING = 4'h4;
  
parameter BURST_THRES = 5;  

assign burst_len = 12'h14;

assign neg_clk = ~clk;
    
fifo #(
  .DATA_WIDTH(32)
)

   data_buf (
            .clk(clk), 
            .reset(!reset),
            .read(read),
            .write(write),
            .write_data(write_data),
            .empty(), 
            .full(),
            .read_data(axi_d_out)
        );

assign read = (state >= START & !master_write_dst_rdy) ? 1 : 0;
    
always @(posedge clk)
if (!reset)
  count_in_buf <= 0;
else if (!read & write)
  count_in_buf <= count_in_buf + 1;
else if (read & !write)    
  count_in_buf <= count_in_buf - 1;
  
always @(posedge clk)
if (!reset)  
  state <= 0;
else
  case( state )
    IDLE: if (count_in_buf > BURST_THRES)
            state <= INIT_CMD;
    INIT_CMD: state <= START;             
    START: if (cmd_ack)
             state <= ACT;
    ACT: if (!master_write_dst_rdy)
             state <= TRANSMITTING;
    TRANSMITTING: if (!master_write_dst_rdy & bytes_to_send == 1)
                    state <= IDLE;    
  
  endcase
  
  
always @(negedge clk)
if (!reset)
begin
  axi_start_address <= 0;
  axi_data_inc <= 0;
end
else if (state == INIT_CMD)
begin
  axi_start_address <= axi_start_address + axi_data_inc;
  axi_data_inc <= BURST_THRES;
end    

assign mstwrite_req = (state == START) ? 1 : 0;

assign mst_type = (state == START) ? 1 : 0;

assign sof = (state == START) | (state == ACT) ? 0 : 1; 

assign eof = (bytes_to_send == 1 & !master_write_dst_rdy) ? 0 : 1; 

always @*
  if (state == START)
    master_write_src_rdy = 0;
  else if (state > START & bytes_to_send != 0)
    master_write_src_rdy = 0;
  else
   master_write_src_rdy = 1;

always @(posedge clk)
 if (state == START)
   bytes_to_send <= BURST_THRES;
 else if ((state > START) & !master_write_dst_rdy & bytes_to_send != 0) 
   bytes_to_send <= bytes_to_send - 1;
         
assign master_write_dst_rdy = ip2bus_otputs[4];
assign cmd_ack = ip2bus_otputs[0];
assign ip2bus_inputs[0] = mstwrite_req;
assign ip2bus_inputs[1] = mst_type; 
assign ip2bus_mst_addr = axi_start_address;
assign ip2bus_mst_length = 20;
assign ip2bus_mstwr_d = axi_d_out;  
assign ip2bus_inputs[2] = sof;
assign ip2bus_inputs[3] = eof;     
assign ip2bus_inputs[4] = master_write_src_rdy;
endmodule

Let us start by discussing the parameters of the module.

All the single bit signals of the IPIC interface i have combined into two wires: ip2bus_inputs and ip2bus_outputs. This will just make the creation of the block diagram easier by not needing to connect so many wires.

The other module parameters starting with ip2bus will look familiar from the previous section.

The other module parameters of interest is write and write_data. These inputs will pump our module with data that needs to be send to SDRAM. If, during a clock pulse, you have new data on write_data, you need to ensure that write is high.

For test purposes you can just create logic that that send numbers 0 - 5 on write_data.

In later posts we will be writing the output of a VIC-II implementation to write_data.

The data that we pump into our module gets stored temporary in a FIFO (First In First Out) Buffer. There is a number of FIFO implementations that you can use on the Net. The FIFO implementation I used was the following: https://embeddedthoughts.com/2016/07/13/fifo-buffer-using-block-ram-on-a-xilinx-spartan-3-fpga/

One issue with about all FIFO implementations is that it can't tell you how far it is from been full. This kind of information can aid in knowing when to start processing the data in the buffer to avoid a buffer overflow condition. We need to keep track of this count ourselves by keeping track of when we insert a item or remove one from the buffer. The following snippet taken from the complete module code above is responsible for keeping track of the FIFO fill level:

...
always @(posedge clk)
if (!reset)
  count_in_buf <= 0;
else if (!read & write)
  count_in_buf <= count_in_buf + 1;
else if (read & !write)    
  count_in_buf <= count_in_buf - 1;
...

For any given clock cycle if there was a write and no read, our fill level will be one more. If there was a read and no write our fill level will be one less. The fill level will remain unchanged if there was both a read and a write.

You might have also noticed that I maintain a state machine within the module. We start off in the IDLE state and as soon as count_in_buf reached a threshold we transition to subsequent states for sending the data over the IPIC interface.

This conclude this section for developing the IPIC interface.

In Summary

In this post we explored how to interface our FPGA to the SDRAM on the ZYBO board. We ended with developing the IPIC interface that will interface to the AXI Master Burst IP.

In the next post we will continue with the full block design.

Till next time!