Friday 27 December 2019

Creating and Running UBoot

Foreword

In the previous post we started our journey on getting Linux to run on the Zybo board.

We managed to create and run a First Stage Bootloader (FSBL).

In this post we will be building and running UBoot.

UBoot is an intermediate stage bootloader for booting Linux on ARM based devices.

Creating an ARM Cross Compiler

To compile UBoot/Linux for the Zynq, one needs a cross compiler for compiling source into ARM machine code.

In Linux distros like Ubuntu provides these cross compilers as packages that you can download and install. These packages, however, is sometimes a couple of versions behind and might not be sufficient for building ARM based packages.

In my own experience I have found it better to build the Cross Compiler toolchain yourself. Here is a very handy resource for creating your own toolchain:

https://preshing.com/20141119/how-to-build-a-gcc-cross-compiler/

These set of instructions explains how to build GCC version 4.9.2. This version of GCC is a bit outdated for our purposes. So, for some of the libraries we need to download, we need to get newer ones, which are as follows:

  • GCC version 7.5.0
  • Binutils version 2.25
  • isl version 0.18
When following the instructions from the above link, there are some instructions that need to be tweaked so we can compile for the ARM architecture. These are as follows:

  • --target=arm-linux-gnueabihf as well as --host=arm-linux-gnueabihf
  • ARCH=arm
  • The folder under /opt/cross should be arm-linux-gnueabihf
  • Also, in step 4 (e.g. Standard C Library Headers and Startup Files) the gcc cross compiler is invoked. The filename for this should be arm-linux-gnueabihf-gcc.
Following the steps from above Internet resource, together with the suggested amendments for the ARM architecture, will yield the Cross compiler within the folder /opt/cross.

If you want to use this Cross compiler, you need to ensure that it within the system path, which you can define as follows:

export PATH=/opt/cross/bin:$PATH

Building UBoot

to Build UBoot, we need to follow the instructions on this resource: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841973/Build+U-Boot

Let us follow these instructions step by step. Firstly, we need to get the source code from Github, via a terminal window:

git clone http://github.com/Xilinx/u-boot-xlnx.git

This is basically the uboot source with some Xilinx customisations.

Now, change into the cloned directly.

Next we need to set some environment variables within our terminal session:

export CROSS_COMPILE=arm-linux-gnueabihf-

export ARCH=arm

We can now build the u-boot image with the following:

make distclean
make zynq_zybo_defconfig
make

After the build process, there should be an executable called u-boot.elf.

Testing U-boot

Let us test the U-boot executable.

As in the previous post, with the Zybo board connected to the PC and powered up, start a screen session in a terminal window:

sudo screen /dev/ttyUSB1 115200

In another terminal window, start an XSDB session. Also, as in the previous post, issue XSDB commands for starting the FSBL and then stopping it. At this point, the DDR should have been initialised and we should be able to load the u-boot image into it:

dow ~/u-boot-xlnx/u-boot.elf

When you try these steps when your pc/laptop came out of hibernation, you might be presented with the following error:

Memory write error at 0x4000000. Cannot access DDR: the controller is held in reset

If this is the case, just switch the Zybo board off for a couple of seconds and then on again. After switching the Zybo board on, just re-establish a screen session and following the steps of downloading/starting/stopping the FSBL, and then downloading the U-Boot image.

Issuing the command con, will start the U-boot image. If everything went well, the output on your screen session should like the following:


If one just missed the opportunity to stop the autoboot, you will see some output indicating that it is trying to boot images from various devices.

In Summary

In this post we managed to build and run U-Boot. In the next post we will be building Linux and running it on the Zybo board.

Till next time!

Tuesday 24 December 2019

The boot process of the ZYNQ

Foreword

In the previous post we have scaled up the video frames produced by the VIC-II module so that it can fill the whole screen.

At this point our C64 module has enough functionality implemented so that we can play a game on it. However, everytime we want to play a game with it, the Zybo board need to be attached to a PC, and we need to issue a couple of commands within the Xilinx SDK.

Wouldn't it be nice if the Zybo could just start up on its own and you don't need to connect it to a PC?

This will be the purpose of the next series of posts: To to be able to boot our C64 system on the Zybo board without any hand holding.

To approach this goal, we will work towards been able to run Linux on the Zybo Board. If we are within the Linux ecosystem on the Zybo board, we just have access to so many device drivers, making it easy for us, for instance, to load a tape image from a Micro SDCard.

In this post we will start by getting an overview on how the boot process works on the Zybo Board.

Also, in this in this post I will assume the Zybo board will be connected to a PC running Ubuntu OS or similar.

The Boot Process

When the Zynq is powered up and ready to start the process of booting into an OS, it is faced with a challenging initial condition: The DRAM is disabled.

To get past this hurdle, the Zynq contains onchip memory (OCM). The OCM contains 128KB BootROM code and 256KB SRAM.

The BootROM contains just enough code to load a boot image from a hand full of devices, like a SDCard and QSPI, into SRAM and start execution of it.

256KB of SRAM for loading a boot image is not a lot of memory, so on Zynq devices, the boot process is split into a number of stages:

  • Stage 0: The BootROM starts executing and loads the First stage loader into SRAM
  • Stage 1: The First stage bootloader starts to execute from SRAM. It is responsible for enabling DRAM. Amongst other things it also initialises a number of crucial on chip peripherals as well as a number of clocks on the Zynq. Also the bitstream is read into memory as well as the user application
  • Stage 2: At this stage the DRAM is fully operational and control is handed to the user application. In a Linux ecosystem, we will start with running UBoot, which will eventually load and start a Linux Image.
This is quite a cumbersome process. However, I can understand why this process is necessary. To initialise DRAM is quite a complicated process and one can easily introduce a bug when writing the software routine for this process. Should such a bug exist in BootROM, this could render the Zynq chip useless. Moving DRAM initialisation to software that one retrieves from a SDCard greatly reduces this risk.

Of course SRAM is quite an expensive resource. For this reason the Zynq only contains 256KB of it, and we need three stages of booting instead of just 2.

In this post we will focus on the first stage bootloader (FSBL) and in future posts the Stage 2 bootloader.

Creating a First Stage Bootloader

Let us start by having an overview of the process for creating an FSBL. There is a very nice block diagram of the process here:



The Xilinx SDK get a Hardware handoff file from Vivado and generate the FSBL. On the diagram the FSBL gets aggregated with other files like U-boot and uImage to produce a file called BOOT.BIN.

In this post, however, I will show a method where we will directly invoke the FSBL without having to generate a BOOT.BIN first.

Now, in Vivado let us create a Block Design containing only a Zynq block. Also, let us perform all the suggested wiring:


This block diagram looks very simplistic. However, when you open this block, you will see that it contains very important settings, like DDR settings:

It is these type of settings that Vivado will handoff to the Xilinx SDK and will be incorporated into the FSBL.

With this block design created, do a Synthesise, generate bitstream and an export.

Then, from Vivado launch the Xilinx SDK.

In the File menu, select new application project and provide a name for the application project.

Click next and in the left panel select Zynq FSBL:


When you click finish, a new project will be created and visible in the project panel.

A quick way to test our FSBL, would be to write a message to the console when it runs.

To do this, expand the src folder and open the file main.c. Scroll down to the main method. Just after the call to RegisterHandlers, add the following line:

 xil_printf("Hello World");

Click the down arrow next to the hammer icon and select the Release configuration. Now build the project.

Now, within the sdk folder, within your project folder and under the folder Release, there should be a *.elf file. This is the First stage Bootloader that we will use in the next section.

Testing the First Stage Bootloader

Let us test the FSBL we have created in the previous section.

Ensure the Zybo board is configured to boot into JTAG mode. Hook it up to a PC and power it up.

For his exercise we will need two terminal windows running on the Ubuntu PC. On the first terminal, issue the following command:

sudo screen /dev/ttyUSB1 115200

This is the terminal session where we expect our Hello World message to appear when our FSBL run. The device file might be different in your case or might even be /dev/ttyUSB2.

In the other terminal change to the bin folder of your Xilinx SDK installation and issue the following command:

./xsdb

You will be presented with the XSDB console. At this console, issue the following command:

connect

If you now issue the targets command, you will see a list of targets you can connect to:

xsdb% targets                                                                   
  1  APU
     2  ARM Cortex-A9 MPCore #0 (Running)
     3  ARM Cortex-A9 MPCore #1 (Running)
  4  xc7z010


Our goal is to run our FSBL on the first core, so let us select this core and stop it:

xsdb% target 2                                                                  
xsdb% stop                                                                      
Info: ARM Cortex-A9 MPCore #0 (target 2) Stopped at 0xffffff28 (Suspended)
xsdb% targets                                                                   
  1  APU
     2* ARM Cortex-A9 MPCore #0 (Suspended)
     3  ARM Cortex-A9 MPCore #1 (Running)
  4  xc7z010


We will now load the FSBL into the Zybo board from the location where it was generated. In my case the console output will look as follows:

xsdb% dow ~/fsbl-test/fsbl-test.sdk/fsbl_test/Release/fsbl_test.elf             
Downloading Program -- /home/johan/fsbl-test/fsbl-test.sdk/fsbl_test/Release/fsbl_test.elf
 section, .text: 0x00000000 - 0x0000d38b
 section, .handoff: 0x0000d38c - 0x0000d3d7
 section, .init: 0x0000d3d8 - 0x0000d3ef
 section, .fini: 0x0000d3f0 - 0x0000d407
 section, .rodata: 0x0000d408 - 0x0000d75f
 section, .data: 0x0000d760 - 0x0001054f
 section, .eh_frame: 0x00010550 - 0x00010553
 section, .mmu_tbl: 0x00014000 - 0x00017fff
 section, .init_array: 0x00018000 - 0x00018003
 section, .fini_array: 0x00018004 - 0x00018007
 section, .rsa_ac: 0x00018008 - 0x0001903f
 section, .bss: 0x00019040 - 0x0001ae6f
 section, .heap: 0x0001ae70 - 0x0001ce6f
 section, .stack: 0xffff0000 - 0xffffd3ff
100%    0MB   0.5MB/s  00:00                                                    
Setting PC to Program Start Address 0x00000000
Successfully downloaded /home/johan/fsbl-test/fsbl-test.sdk/fsbl_test/Release/fsbl_test.elf


The FSBL will be loaded at location starting at address 0, which is the area where SRAM resides of OCM.

We can now resume execution of core #0 with the command con. We can now see the output on the other Terminal Window:


Our first stage bootloader worked!

In Summary

In this post we have created and tested a First Stage Bootloader.

In the next post we will continue our journey in getting Linux to run on the Zybo Board.

In particular, we will be getting UBoot to run on the Zybo Board. UBoot is a component you will find in many ARM based systems that assist in booting Linux.

Till next time!

Friday 20 December 2019

Scaling up the display: Part 2

Foreword

In the previous post we started to investigate the possibility of scaling up the images produced by the VIC-II module, so that it can fill the whole display.

For this purpose we used David Kronstein's Video scaler core. So, in the previous post we tested this core with a test bench to see how the image looks like that is produced by this core.

I was quite satidfied by the results produced by Kronstein's core, so in this post we will integrate Kronstein's core within our C64 FPGA design.

Overview

The following diagram gives an overview of what we want to accomplish in this post:


The flow of the diagram starts off more or less the same as our current design which displays the VIC-II frames on a VGA screen.

We retrieve pixel data from SDRAM via AXI and buffer it. As these data is words of 32-bits, thus containing two pixels per word, we need to split the word into individual pixels.

We also buffer these individual pixels into a FIFO buffer. This FIFO buffer has an additional function of moving data from the AXI clock domain (100MHZ) to the VGA clock domain (84MHz).

In our previous design we directly output pixels from this FIFO to the VGA display.

In this post, however, we introduce two new blocks, the Video Scaler and a FIFO for buffering the effect of potential lag from the Video Scaler.

The most tricky scenario in this design is when we reset all the components in preparation for the next frame. With the video scaler been reset for the next frame, it will immediately start requesting data from the asynchronous FIFO when it becomes available. This can potentially lead some empty conditions in our asynchronous FIFO.

In practice, however, I found that our Asynchronous FIFO doesn't handle these intermittent empty states very well.

It is far better on frame reset to rather give the Asynchronous FIFO to fill up a bit, before starting to read from it. In this why we avoid the asynchronous buffer running empty. We will cover this in a bit more detail in a coming section.

Supplying input data to the Video Scaler

Let us connect the necessary ports so that we can supply input data to the Video Scalar.

First, let us cater for the scenario where we need to reset all the blocks upon a new frame.

The trigger_restart_state register indicate when we are about to start with a new frame. However, this register is clocked within the AXI clock domain, but we need it within the VGA domain, so let us create a two flip-flop synchroniser to take care of the scenario:

(* ASYNC_REG = "TRUE" *) reg state_1, state_2, state_3, state_4, state_5;

always @(posedge clk)
begin
  state_1 <= trigger_restart_state == RESTART_STATE_RESTART;
  state_2 <= state_1;
  state_3 <= state_2;
  state_4 <= state_3;
  state_5 <= state_4;
end;

streamScaler #(
//---------------------------Parameters----------------------------------------
.DATA_WIDTH(8),  //Width of input/output data
.CHANNELS(3)  //Number of channels of DATA_WIDTH, for color images
//---------------------Non-user-definable parameters----------------------------
)
  myscaler
(
...
.start(state_5),
...
);


We receive pixel data from the asynchronous FIFO relaying 16-bit pixel values from the AXI domain to the VGA domain. As mentioned in the previous post, the Video scaler expects 24-bit samples, so let us do a conversion:

streamScaler #(
//---------------------------Parameters----------------------------------------
.DATA_WIDTH(8),  //Width of input/output data
.CHANNELS(3)  //Number of channels of DATA_WIDTH, for color images
//---------------------Non-user-definable parameters----------------------------
)
  myscaler
(
...
.dIn({out_pixel_buffer[15:11],3'b0,out_pixel_buffer[10:5],2'b0,out_pixel_buffer[4:0],3'b0}),
...
);


The next port to focus on is the port on the video_scaler, signalling it that the data is valid. For this, lets start off simple, saying that the data is valid if the asynchronous buffer is not empty and it is not the start of the frame:

assign data_valid_in = !state_5 && !async_empty;

You might remember that in the previous section I mentioned that it is preferable to give the asynchronous buffer some time to fill up before reading from it. It would be indeed the data_valid_in port we need to cater for this:

assign data_valid_in = !state_5 && !async_empty && scalar_init;

always @(posedge clk)
begin
  if (state_5)
    scalar_init <= 0;    
  else if (!async_empty && (count_till_read == 60))
    scalar_init <= 1;

  if (state_5)
    count_till_read <= 0;
  else if ((count_till_read < 60) && !async_empty)
    count_till_read <= count_till_read + 1;
end

So, we we hold back asserting the data_valid_in port till our async buffer has been non empty for about 60 clock cycles.

Next, we need to connect the read port on the async fifo:

aFifo
  #(.DATA_WIDTH(16))
  my_fifo
     
    (
...
           .ReadEn_in(nextDIn & data_valid_in),
...
     );


You might recall from quite a number of posts that we have enabled reading from this port when the vga raster was within the visible range. This port is now controlled by the video_scaler (nextDIn). We hold the read back by means of data_valid_in, giving the aFifo a chance to fill up.

Buffering the output of the Video Scaler

As mentioned in the Overview section, we need to buffer the output of the Video scaler.

So, let us start by by defining another FIFO instance:

fifo #(
  .DATA_WIDTH(16),
  .ADDRESS_WIDTH(4)
)

   data_buf_vga (
            .clk(clk), 
            .reset(state_5),
        );


This buffer has a capacity of 16 elements of 16 bits each. Since the Video Scaler outputs samples of 24 bits, we need to connect the write_data port of the FIFO as follows:

fifo #(
  .DATA_WIDTH(16),
  .ADDRESS_WIDTH(4)
)

   data_buf_vga (
...
            .write_data({data_out[23:19],data_out[15:10],data_out[7:3]}),
...
        );


Now, the nextDout port of the Video Scaler need to be in sync with the write port of the FIFO:

...
fifo #(
  .DATA_WIDTH(16),
  .ADDRESS_WIDTH(4)
)

   data_buf_vga (
...
            .write((vert_pos > 10)  & (vert_pos < 760) & data_valid_out & !full_vga_fifo),
            .full(full_vga_fifo),
...
        );
...
streamScaler #(
//---------------------------Parameters----------------------------------------
.DATA_WIDTH(8),  //Width of input/output data
.CHANNELS(3)  //Number of channels of DATA_WIDTH, for color images
//---------------------Non-user-definable parameters----------------------------
)
  myscaler
(
...
.dOutValid(data_valid_out_debug),
.nextDout((vert_pos > 10)  & (vert_pos < 760) & !full_vga_fifo),
...
);
...


The actual idea is to start streaming data out to the screen at line number 20, so we start pre-filling the buffer at line 10.

Streaming the data out to the screen

In the previous section we buffered data from the Video scalar. In this section we will output the buffered data to the VGA port.

As the first step, let us connect all the read ports:

fifo #(
  .DATA_WIDTH(16),
  .ADDRESS_WIDTH(4)
)

   data_buf_vga (
...
            .read((vert_pos > 20)  & (vert_pos < 760) &
                                            (horiz_pos > 100) & (horiz_pos < 1175)),
            .read_data(fifo_data_read)
...
        );


As seen here the visible portion of the screen is between line 20 and 760. On each line the visible portion is between pixel 100 and 1175.

The invisible portions of the screen why want to fill with a black border. To do this we need need to block out the read_data when we are within the invisible regions:

 assign out_pixel_buffer_final = (vert_pos > 20)  & (vert_pos < 760) &
                                (horiz_pos > 100) & (horiz_pos < 1175)
                                ? fifo_data_read : 0;


This out_pixel_buffer_final signal we need to split into the indivudual red, green, blue signals that go to the VGA port:

assign red = out_pixel_buffer_final[15:11];
assign green = out_pixel_buffer_final[10:5];
assign blue = out_pixel_buffer_final[4:0];


Results

I created the following video to demonstrate how the C64 module renders on the VGA screen with the help of video upscaling:


For this demo I loaded the game Blue Max from a tape image. It starts off with the last couple of seconds playing the music of the loader, then the intro tune of Blue Max. I then briefly play the game for a couple of seconds.

In Summary

In this post we integrated David Kronstein's core within our C64 module.

Up to this point we always fired up the Zybo board attached to a PC. It would actually be nice if we could fire up the Zybo board on its own, with an external power supply.

So in the next post we will start investigating how to boot the Zybo from a SDCARD. To kick off this investigation, we will see if we can boot Linux on the Zybo board.

Till next time!




Sunday 15 December 2019

Scaling up the display: Part 1

Foreword

In the previous two posts We have implemented SID sound within our C64 FPGA module.

If we look back to the Introduction post of this Blog series, the purpose of this series was to create a Complete C64 system on an FPGA.

I think we got pretty close to this goal. We have implemented the following:


  • Integrated Arlet Otten's 6502 core into our design.
  • Implemented C64 memory banking.
  • Booting the whole C64 system.
  • Loading a game from a .TAP image
  • Implementing VIC-II module capable of displaying sprites together with a couple of its graphics modes, like multicolor bitmap, and multicolor text mode.
  • SID sound.
Granted, an important item missing from the list is implementing a C64 disk drive like a 1541. I am, however, not entirely sure if I would want to go down that road, since we already utilised the majority of the Block RAM resources of the ZYNQ FPGA, so I doubt if heir would be sufficient resources left for implementing a 1541 module (e.g. the core of a 1541 disk drive is also a 6502 CPU, also requiring RAM and ROM to operate).

There are, however, some other items currently missing in our C64 module, which I thought would be nice to implement and for which I will be writing some blog posts on how to implement them.

The first item is to scale up the frames produced by the VIC-II module. Currently these frames have a resolution of 404x284. With most monitors available on the market today, these frames will just fill a tiny portion of the screen.

So, we will decicate a post or two on how to scale the VIC-II generated frames up, so that it fills most of the screen.

Another issue that is worth looking into is the fact that currently our C64 module cannot operate on its own on a Zybo board. The Zybo board always needs to be connected to a PC to upload a Bitstream image and for kicking off a standalone program in the Xilinx SDK for providing USB keyboard functionality.

I will also write some Blog posts for implementing a solution for above mentioned issue, which would involve booting Linux from a SDCard fitted to the Zybo board and also loading a bitstream image from the same SDCard into the FPGA.

This is more or less what I have planned for future posts in this Blog series.

Let us start and see if we can upscale the frames produced by the VIC-II!

David Kronstein's Video Scalar Core

As the old saying goes: Don't re-invent the wheel. In this series I tried to apply this bit of advice numerous times:
  • Using Arlet's 6502 core.
  • Making use of an asynchronous FIFO buffer as suggested on a Xilinx's community forum.
  • Using Thomas Kindler's SID implementation.
So, is there a Verilog module available that can scale up an image. Indeed there is on OpenCores website: https://opencores.org/projects/video_stream_scaler

The SVN browser on the website allows us to get hold of the source code. Two files are of importance:

  • Video+Stream+Scaler+Specifications.pdf
  • scalar.v
The pdf explains very nicely how the scalar works.

The file scalar.v contains the main module, as well as all sub modules, within one file.

Let us start by having a look at the ports of the Video Scaler module:

//---------------------------Module IO-----------------------------------------
//Clock and reset
input wire    clk,
input wire    rst,
 
//User interface
//Input
input wire [DATA_WIDTH*CHANNELS-1:0]dIn,
input wire       dInValid,
output wire       nextDin,
input wire       start,
 
//Output
output reg [DATA_WIDTH*CHANNELS-1:0]  dOut,
output reg         dOutValid,
input wire         nextDout,
 
//Control
input wire [DISCARD_CNT_WIDTH-1:0] inputDiscardCnt, 
input wire [INPUT_X_RES_WIDTH-1:0] inputXRes,
input wire [INPUT_Y_RES_WIDTH-1:0] inputYRes,
input wire [OUTPUT_X_RES_WIDTH-1:0] outputXRes,
input wire [OUTPUT_Y_RES_WIDTH-1:0] outputYRes,
input wire [SCALE_BITS-1:0]   xScale,
input wire [SCALE_BITS-1:0]   yScale,
 
input wire [OUTPUT_X_RES_WIDTH-1+SCALE_FRAC_BITS:0] leftOffset,
input wire [SCALE_FRAC_BITS-1:0] topFracOffset, 
input wire    nearestNeighbor
);


The clk and rst is obvious, so let us skip to the input port section.

The dIn is the pixel data input. For our emulator we will have three channels (e.g. RGB) and each channel will be 8 bits wide. This may sound confusing at first since the Zybo board works with 16 bit pixels in the format RGB565. However, this scaler assumes the same data width for all channels, so for this reason we will just stick with 8 bits per channel.

The next two signals are handshake signals between the pixel data originator and the video scaler. When the pixel data originator has made data available for a new pixel, it will assert the dInValid signal. In return, the video scaler will assert the nextDin signal when it has accepted the data.

Something to keep in mind with our VIC-II module is that it is outputting pixels at a constant rate and cannot be told to pause for a couple of clock signals. The Video scalar in turn can end up time and again in a situation where it is not able to accept data at a particular clock pulse. We will, however, cross this bridge when we get there.

The last port of the input section, start, is used to signal the video scalar that we are about to transmit data for a new frame.

Next we get to the output port section. With these set of ports our Video scaler behaves like a pixel data producer, which is pixel data for the actual upscaled image. Similarly, dOutValid and nextDout are handshake signals.

Let us move onto the final section, the control section. There is couple of ports in this section I am not going to worry about and I am just going to connect them to the value zero. These ports are the following:

  • inputDiscardCnt, 
  • leftOffset,
  • topFracOffset and
  • nearestNeighbor
The other ports are for specifying the input resolution, output resolution and the ratio by which the input frames should be resized by.

We will calculate the values of these ports in the next section.

Calculating the values of the control ports

Let us start by determining the values for inputXRes and inputYRes.

The spec for the video scaler that for each resolution port we should supply a value that is the actual value, minus one.

We know that the frames produced by our VIC-II has a resolution of 404x284, so we should specify a value of 403 for inputXRes and a value of 283 for inputYRes.

Next, we should decide on the output resolution. For this one would be tempted to use the physical resolution of the monitor you are going to use for the display of the output frames.

However, in these times changes are good that the monitor you will be using will be a wide screen, whereas the the output of a VIC-II would be more towards a square aspect ratio. So, one would end up with a stretched image if using the physical screen resolution as the output resolution for the Video Scaler.

So, we need to proportionally scale up the input image till it just fills the height of the screen.

I am going to use my screen as an example, which has a resolution of 1366x768. I am a bit hesitant to use the full height of the screen, since I just to leave a bit of 'buffering' space to account for possible lag by the video scaler before producing output for the next frame.

So, I will be using a height of 758 for our output frame. At this point we need to calculate the ratio by which we will be resizing our input image.

(Output Resolution Y) / (Input Resolution Y)
= 758 / 284
= 2.6690

This is a very important factor, and we will be using it later again for the ports xScale and yScale.

To get the horizontal output resolution, we should multiply the horizontal input resolution by this factor:

(Input Resolution X) * 2.6690
= 404 * 2.6690
= 1078.276 ≈ 1078

Thus, our output image should have resolution of 1078x758, resolving into a value of 1077 for outputXRes and a value of 757 for outputYRes using the minus one constraint specified in the spec of the video scaler.

Next, we should calculate the values for xScale and yScale, which in our case will the same.

According to the spec, xScale gets calculate by (inputSize / Output size). This is different to the way we have calculated our scale factor, which is (Output size / Input Size). So, to get to a valid value for xScale and yScale, we should use the reciprocal of our factor, which yields around 0.374672.

We now need to represent this fraction in binary. Sound like a daunting task, but fear not!

The scale value uses 4 bits for the integer value and 14 bits for the fraction, totalling 18 bits. This can be visually represented as follows:

8 4 2 1 0.5 0.25 0.125 0.0625

For clarity, I have shown only the first couple of fractions bits.

Through some trail and error, I found that 0.25 and 0.125 gives a good enough estimation for us: 0.375. Let us convert this fraction into hexadecimal:

0000.01100000000000
= 000001100000000000
= 00 0001 1000 0000 0000
=  0   1    8   0    0

So, the value to use for both xScale and yScale is 18'h1800.

Creating a Test Bench

Let us create a test bench to test the Video Scaler.

First thing we should do is to get hold of source image data. An easy way to get this is to run our C64 FPGA design on the Zybo Board, and then do a mrd (e.g. Memory read command) on the XSCT console where from the memory area where the frame is stored and write it to a file.

However, be mindful to the fact that information is stored in memory as 32-bit words in little endian format. The pixels been in the format RGB565, it would mean that every pair of pixels will have there order reversed.

So, it might be necessary to write a program for reversing the order of the pixels for this exercise.

Next, let us write some code to read data from this file and supplying to the video scaler on demand:

reg [15:0] pixel_in_data;

initial begin
  f = $fopen("<file.data>","rb");
  #300
  @(negedge clk)
  start = 1;
  @(negedge clk)
  start = 0;
  #300;
  
  while (data_count < 500000)
  begin
    @(negedge clk)
    if(nextDIn)
    begin
      data_valid_in = 1;
      $fread(pixel_in_data,f);    
    end 
  end
end


We start by triggering the start flag to inform the video scaler that we are at the beginning of the frame and about to send data.

We keep reading pixel data while the video scaler has asserted the next_in port. With the first pixel that we read, we also asserts the data_valid_in port.

Data in the source file is in the format RGB565, and the video scaler expect it in the format 24-bit color, so we convert it like this:

streamScaler #(
.DATA_WIDTH(8),
.CHANNELS(3)
)
  myscaler
(
...
.dIn({pixel_in_data[15:11],3'b0,pixel_in_data[10:5],2'b0,pixel_in_data[4:0],3'b0}),
...
);


Next, we should capture all the generated pixels from the video scaler and save it as an image file, so we can view it in an image viewer.

I would like like the resulting image file to be again in RGB565 format, so I will do a conversion again:

...
wire [15:0] data_out_concat;
...
assign data_out_concat = {data_out[23:19], data_out[15:11],data_out[7:3]};
...

We write the pixel data as follows:

initial begin
fw = $fopen("<outputfile.data>","wb");
while (data_count < 5000000)
begin
  @(posedge clk)
  if (data_valid_out)
  begin
    $fwrite(fw,"%c",data_out_concat[7:0]);
    $fwrite(fw,"%c",data_out_concat[15:8]);
  end
end
end


So, while data_valid_out is asserted we write the pixel data to the resulting image file. Also, we write each pixel in little endian order, which is the format required by the image viewer.

The Result

Let us have a look at the resulting image file.

We are going to use GIMP to view the image file. Gimp can read image raw image data, provided use the file extension .data.

On opening this file, we specify the format RGB565 and the resolution 1078x758:


The resulting rescaled image is as follows:


Admitted, one cannot clearly see the difference between the scaled up output of the video scalar and the original frame.

We will, however, better see the effect of the upscaling when displayed on a monitor, which we will cover in the next post.

In Summary

In this post we started to investigate upscaling for use in our C64 emulator, so we can fill the whole screen instead of a tiny portion.

We identified David Kronstein's Video scaler core as the candidate for use in this task.

We created a test bench for testing David Kronstein's Video Scaler and successfully managed to upscale a test image.

In the next post we will be integrating this core into our C64 emulator.

Till next time!