Sunday, 25 April 2021

Running an Amiga core on a Zybo Board: Part 4

Foreword

In the previous post we managed to get the Source code of the Mini Amiga project to compile in Vivado and ran the resulting bitstream on the Basys 3 board.

With the core on the Basys 3, we did a very basic test, confirming whether the address requests to external SDRAM was targeted for the Kickstart ROM area.

Our next goal is to get the Kickstart ROM to run.

You might have noticed towards the end of the previous post that I am alternating between the term Kickstart ROM and AROS ROM. Many readers are more familiar with the term Kickstart ROM, whereas we are going to run the AROS ROM image in our design. So, just keep that in mind when I interchange between the two terms.

The Kickstart ROM is 512KB in size, more than the amount of Block RAM the Basys 3 can offer. We will therefore move in this post to the Zybo board, which supports external SDRAM.

Our primary focus in this post will be to develop an interface where, given an address, the interface will one word of data from that location in SDRAM.

Overview

Quite a while ago, while I was still developing a C64 block for an FPGA, I have also created a wrapper for accessing SDRAM on the Zybo board, here.

The purpose was to save the frames produced by the the VIC-II module and render it to a VGA screen at a different frame rate.

In the design I used the IP Burst block, provided by Xilinx, to shield me from the details of the AXI protocol.

The design I ended up with, was a streaming interface, that accessed data in a serial fashion. This is a suitable interface for rendering frames to a VGA screen, where the data required is also of a serial nature.

With the streaming interface we can easily predict which data will be required in the future and therefore prefetch the data, therefore mitigate the effect of latency.

In our Amiga design, however, the 68000 will be accessing the SDRAM in a much more random fashion, so a streaming interface will not give us any real benefit. So, in this post we will be designing a very simple SDRAM interface, where we provide an address, and the interface will return the relevant word of data from SDRAM.

Obviously, with this interface SDRAM latency will work against us, but for coming posts we will first try to get the system to work and attempt to fix the latency issues later on.

Creating an AXI Block

In a previous post, here, I explained how to create an AXI block in a Zybo block design.

In that post, I have also explained how to utilise IP Burst Block in your created AXI block. As mentioned earlier, the IP Burst block shields you from the technicalities of the AXI protocol.

Well, after a couple of years down the line, I tend to disagree with the statement in the previous paragraph. Working with AXI protocols is not that bad and using the IP Burst block is a bit of an overkill.

The thing is, when creating an AXI master block, the wizard do create a template for you that is basically an example of how to use the AXI protocol. A couple of years ago, however, this template just looked like a very complex state machine, and so I decided to use the IP Burst block as an alternative. 

Having a look at the state machine that the AXI Block Wizard create, one can see that it is basically a memory tester. It starts off writing a certain test pattern of data to memory and afterwards read it back from memory, checking to see if it matches the test pattern.

It is easy enough to alter the path of this state machine to read or write on demand. We will tackle this in the next section.

Modifying the AXI block

Let us modify the AXI block to fit our needs.

First we need to look at the state machine that is implemented in case-statements. The state machine transitions as follows: IDLE > WRITE > READ > COMPARE > IDLE.

In order to implement our read on demand, we need to transition back to IDLE after a read or a write. Also in the IDLE state we need to directly transition directly to a READ or a WRITE given a command. We do this as follows:

           if ( init_txn_pulse == 1'b1)                                                      
             begin                                                                                        
               mst_exec_state  <= user_write ? INIT_WRITE : INIT_READ;                                                              
               ERROR <= 1'b0;
               compare_done <= 1'b0;
             end                                                                                          
           else                                                                                            
             begin                                                                                        
               mst_exec_state  <= IDLE;                                                            
             end   
Here user_write is an input port we define on the module. If it is a 1, it is a write and a read otherwise.

Next let us add some ports to the AXI block for supplying commands:

...
// Users to add ports here
        input [31:0] user_address,
        input [31:0] user_data,
        input user_write,
// User ports ends
...
So, we have a port to specify an aaddress for a read/write command. Also, for a write, we have a port to specify the data.

Let us have a look at how these ports are going to be used:

// Next address after AWREADY indicates previous address acceptance    
 always @(posedge M_AXI_ACLK)                                        
 begin                                                                
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                            
     begin                                                            
       axi_awaddr <= user_address;                                            
     end                                                              
   else if (M_AXI_AWREADY && axi_awvalid)                            
     begin                                                            
       axi_awaddr <= axi_awaddr/* + burst_size_bytes*/;                  
     end                                                              
   else                                                              
     axi_awaddr <= axi_awaddr;                                        
   end          
...
/* Write Data Generator                                                            
Data pattern is only a simple incrementing count from 0 for each burst  */        
 always @(posedge M_AXI_ACLK)                                                      
 begin                                                                            
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                                        
     axi_wdata <= user_data;                                                            
   //else if (wnext && axi_wlast)                                                  
   //  axi_wdata <= 'b0;                                                          
   else if (wnext)                                                                
     axi_wdata <= axi_wdata/* + 1*/;                                                  
   else                                                                            
     axi_wdata <= axi_wdata;                                                      
   end              
...
// Next address after ARREADY indicates previous address acceptance  
 always @(posedge M_AXI_ACLK)                                      
 begin                                                              
   if (M_AXI_ARESETN == 0 || init_txn_pulse == 1'b1)                                          
     begin                                                          
       axi_araddr <= user_address;                                          
     end                                                            
   else if (M_AXI_ARREADY && axi_arvalid)                          
     begin                                                          
       axi_araddr <= axi_araddr/* + burst_size_bytes*/;                
     end                                                            
   else                                                            
     axi_araddr <= axi_araddr;                                      
 end
...      
In this snippet of code, we have kept the original template code more or less intact. It is just the pieces in bold that we have changed.

You will notice that in a couple of place we use init_txn_pulse. This is used to trigger a read or a write transaction.

One thing we still need to do is to indicate to the outside world when a read/write transaction has completed and also send the data when the transaction was a read. To do this, we just need to watch the signals wnext and rnext:

...
        output reg [31:0] user_data_out,
        output reg user_data_ready,
...
    always @(posedge M_AXI_ACLK)
    begin
      if (wnext || rnext)
      begin
        user_data_ready <= 1;
      end else if(!INIT_AXI_TXN)        
      begin
        user_data_ready <= 0;
      end      
    end
...
    always @(posedge M_AXI_ACLK)
    begin
      if (rnext)
      begin
        user_data_out <= M_AXI_RDATA;
      end
    end
...
Keep in mind that M_AXI_ACLK is clocking at 100MHZ, so for one clock cycle rnext/wnext might be high, and low at the next one. The Mini Amiga block is clocking considerably slower than 100MHz, and will miss these notifications if  we rely directly on rnext/wnext. It is for that reason why we are storing the value for user_data_ready and user_data_out.

Testing the AXI block

Let us write a module for testing the AXI block we have created in this post.

With the test we will basically write some test data to SDRAM and then read it back.

We will implement this with a very simple statement machine, which will be a counter of which its bits will have different purposes.

This counter will be 11 bits wide, if which we will use the bits as follows:
  • bit 10: Read/Write
  • bit 9/8: Index for generating read data/address
  • bit 7-0: Lower part of counter
As mentioned above, we will be using bits 9 and 8 as a index to generate some random addresses for reading and writing, which we will do as follows:
 
always @(*)
begin
  if (counter[9:8] == 0)
  begin
    data_out = 20;
  end else if (counter[9:8] == 1) 
  begin
    data_out = 120;
  end else if (counter[9:8] == 2) 
  begin
    data_out = 30;
  end else if (counter[9:8] == 3) 
  begin
    data_out = 10;
  end else 
  begin
    data_out = 111;
  end
end

always @(*)
begin
  if (counter[9:8] == 0)
  begin
    address = 20;
  end else if (counter[9:8] == 1) 
  begin
    address = 120;
  end else if (counter[9:8] == 2) 
  begin
    address = 30;
  end else if (counter[9:8] == 3) 
  begin
    address = 10;
  end else 
  begin
    address = 111;
  end
end

We use the lower part of the counter, bits 7 - 0, to decide when to trigger the init_txn pulse. We use such a big range to ensure that we leave enough gap for latency:

always @(posedge clk)
begin
  if (counter[7:0] == 20)
  begin
    init_txn <= 1;
  end else if (counter[7:0] == 118) 
  begin
    init_txn <= 0;
  end
end

This is enough coding for a test. Let us do some testing.

The following logic trace shows some results:


The first two lines contains the requested address. At the bottom the read/write requests are shown, of which the section shown is mostly reads.

The line, user_data_out, shows the data that is read back from memory. This is not very clear in the picture, but the values are 0x14, 0x78, 0x1e. These values converted to decimal are 20, 120 and 30.

This corresponds to the values we used in our test module.

In Summary

In this post we have created an AXI block that will enable us to read and write to SDRAM on the Zybo board.

In the next post we will integrating this AXI block with the Mini Amiga block, and try to boot the AROS ROM.

Till next time!

Wednesday, 14 April 2021

Running an Amiga core on a Zybo Board: Part 3

Foreword

In the previous post we managed to get the fx68k core to run on a Basys 3 board, and execute a very simple machine code program.

In this post we will continue our journey in getting an Amiga core to run on a Zybo board. Having said that, in this post we will again do the exercise on the Basys 3 board. We will be finally moving to the Zybo board in the next post.

As mentioned in a previous post, we will be scrutinising the Mini Amiga project, here, that will form the basis of our Amiga exercise.

The Mini Amiga project is based on a Altera FPGA, and for that reason we need to scrutinise the code of this project. However, this makes the journey so much more exciting.

A bit of background on the Mini Amiga project

If one reads this article on Wikipedia, one can see that the original Mini Amiga project was based on a physical Motorola 68000 CPU, and the Amiga chipset was implemented within an FPGA.

Since then, the source code of the Amiga Project was adjusted so it can be used within the main MiSTer project.

To understand the source code of the Mini Amiga Project, a good starting point would be to look at the file rtl/minimig.v. This file used to be the top level module for the original Mini Amiga project. Looking at the inputs/outputs of this module, it is immediately obvious that this module doesn't host a 68000 implementation, and need to be connected externally.

To cater for an on-FPGA implementation for the 68000, a couple of wrappers were created around the minimig module. More on this in the next section.

A deeper look into the source code

In the previous section I mentioned that rtl/minimig.v used to be the top level module for the original implementation for the Mini Amiga project, and that wrappers were written to interface it with an on-FPGA  68000 implementation.

Let us have a closer look at these wrappers. We start by looking into the file Minimig.sv, which is located in the root. This file hosts a module called emu, which in turn instantiates an instance of the minimig module. 

Something else that is also interesting of the module emu, is that it instantiates an instance of cpu_wrapper, and it is here where we actually create an instance of the fx68k core and linking up the ports to the corresponding minimig instance.

The rest of the wrapper code gets very specific to the features of the DE10-Nano board. 

For our goal of implementing an Amiga on a Zybo board, we will be creating our own wrappers around the Minimig module.

Pruning the Minimig module

When porting a complex project from one platform to another, a lot of times it is required to have an in-depth understanding of how the system works.

This is easier said than done, especially if it is not possible to play with the system on the original platform.

In our case, we have a similar scenario. Your preferred FPGA board for implementing the Mini Amiga will probably not be the DE-10 Nano board, for which the current project is written for. Also, buying the DE-10 Nano board to gradually understand the components of the Mini Amiga project, before moving to your actual choice of FPGA board, doesn't sound like an economically viable option either.

To get around the problem of complexity, I will be stripping down the Minimig module to its most basic form, and then re-adding functionality as we go along.

We start by looking at the ports of the Minimig module, which are grouped as follows:

  • m68k pins
  • sram pins
  • system pins
  • rs232 pins
  • I/O
  • host controller interface (SPI)
  • Video
  • RTG Framebuffer control
  • audio
  • user/io
  • fifo/track display

In our first round, we will only be using the first three group of ports, which are m68k pins, sram pins and sram pins. The rest of the ports we will remove or comment out.

Next, let us have a look inside the minimig module, as which instances we can remove. I have found the following to remove:

  • userio
  • rtg

Modifying top.v

We are now going to modify the top module we have created in the previous post, so it can include an instance of the minimig module.

One thing you will notice when connecting the ports of the Minimig module, is that there is quite a number of clock inputs, like clk7_en, c1, c3, cck and eclk. Luckily a module is the provided in the Mini Amiga project for generating these signals given a clock input. This module is present in the file rtl/amiga_clk.v.

Let us create an instance of this module in our top module:

module top(
...
    );
...
amiga_clk amiga_clk
        (
          .clk_28(clk_28mhz),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(button_reset))
        );
...
endmodule
Here is another mystery. The amiga_clk module wants a 28MHz clock input, whereas in the previous post we have defined a clock of 16MHz for clocking our CPU.

It turns out that in the Mini Amiga project, the CPU is clocked at 28MHz, whereas most of the Amiga components have a resulting clock speed of 7MHz.

We therefore need to adjust the frequency of our generated clock from 16MHz to 28MHz (actually 28.375160MHz, to be exact).

The resulting fx68k and minimig instance look as follows:

...
    fx68k fx68k(
        .clk(clk_28mhz),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // input logic HALTn = 1'b1,            // Not all tools support default port values
        
        // These two signals don't need to be registered. They are not async reset.
        .extReset(reset_cpu_in),            // External sync reset on emulated system
        .pwrUp(reset_cpu_in),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .oRESETn(reset_cpu_out),

        .ASn(As), 
        .LDSn(Lds), 
        .UDSn(Uds),
        .DTACKn(data_ack), 
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(interrupts[0]), 
        .IPL1n(interrupts[1]), 
        .IPL2n(interrupts[2]),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
        );
...
minimig minimig
 (
     //m68k pins
     .cpu_address(add), // m68k address bus
     .cpu_data(data_in),    // m68k data bus
     .cpudata_in(data_out),  // m68k data in
     ._cpu_ipl(interrupts),    // m68k interrupt request
     ._cpu_as(As),     // m68k address strobe
     ._cpu_uds(Uds),    // m68k upper data strobe
     ._cpu_lds(Lds),    // m68k lower data strobe
     .cpu_r_w(read_write),     // m68k read / write
     ._cpu_dtack(data_ack),  // m68k data acknowledge
     ._cpu_reset(reset_cpu_in),  // m68k reset
     ._cpu_reset_in(reset_cpu_out),//m68k reset in
     .nmi_addr(0),    // m68k NMI address

     //sram pins
     .ram_data(),    // sram data bus
     .ramdata_in(22),  // sram data bus in
     .ram_address(ram_add), // sram address bus
     ._ram_bhe(),    // sram upper byte select
     ._ram_ble(),    // sram lower byte select
     ._ram_we(),     // sram write enable
     ._ram_oe(),     // sram output enable
     .chip48(),      // big chipram read
 
     //system    pins
     .rst_ext(),     // reset from ctrl block
     .rst_out(),     // minimig reset status
     .clk(clk28mhz),         // 28.37516 MHz clock
     .clk7_en(clk7_en),     // 7MHz clock enable
     .clk7n_en(clk7n_en),    // 7MHz negedge clock enable
     .c1(c1),          // clock enable signal
     .c3(c3),          // clock enable signal
     .cck(cck),         // colour clock enable
     .eclk(eclk),        // ECLK enable (1/10th of CLK)
 );
...

One the changes we did here, was to feed interrupts from the minimig module to our CPU.

One interesting part of the minimig module is the sram section. This section should be interfaced to external SDRAM or DDRRAM, and the idea is that every time when access a memory location that forms part of chipram/fastram or kickstart ROM, the memory request will be redirected to these ports.

My plan is to interface the sram pins to the DDRRAM present on the Zybo board via AXI, in future posts.

In this post, however, we will only be looking at the addresses send on the ram_address port, and verify that addresses is send that falls within kickstart ROM range upon start-up. More on this later.

Building and testing

When trying the synthesise create a bitstream I encountered a couple of errors in Vivado. First of all, Vivado doesn't like the concept of local registers. As an example, take a look at the following snippet from ciaa.v:

// generate a keystrobe which is valid exactly one clk cycle
always @(posedge clk) begin
	reg kms_levelD;
	if (clk7n_en) begin
		kms_levelD <= kms_level;
		keystrobe <= (kms_level ^ kms_levelD) && (kbd_mouse_type == 2);
	end
end
In this case, the register kms_levelD is a local register. To make Vivado happy, you will need to move the register declaration outside of the always block.

Keep in mind that there is some cases where the same local register name is used in a couple of always blocks, in which case you will need give them all unique names, which is quite an undertaking.

I have picked up quite a number of places in the Mini Amiga source where register/wires are used and declared only later on in source files. Vivado doesn't like this either. 

You will also find that the file denise_colortable_ram_mf.v will fail to compile on Vivado. This file is very specific to Altera FPGA's. 

To get around this error, just comment out the instantiation of altsyncram_component. We will revisit this module in a future post.

Eventually everything compiled, and I moved on to testing. As mentioned in the previous post, was goal was to see if addresses were output on the ram_address port that was within the range of the kickstart ROM, upon startup.

Let me spend a moment to explain this. Staring at memory address 0, we have chipram, and ROM only starts at address $F80000.

With a Motorola 68000, the above setup is problematic, because at start-up  the 68000 reads the initial value for the program counter from memory location 4, which is in chipram.

On the Amiga this dilemma is solved by mapping the Kickstart ROM also to location 0 upon start-up. Upon the first read from the CIA, this mapping is removed and the chipram starting at location is free to use.

So, in my test I will be checking to see if addresses starting at $F80000 will appear on the ram_address port upon startup.

In my testing, a couple of things went wrong. Firstly, the minimig module didn't create a reset signal on the _cpu_reset pin for resetting the CPU. It turned out that the userio module, that we have commented out, is responsible for generating the reset signal.

To get around this issue, I have linked up our button_reset signal to the minimig module as follows:

module minimig
(
...
 input button_reset,
...
);
...
assign cpurst = button_reset;
...
endmodule

Also, our amiga_clk instance will need to use the _cpu_reset port of the minimig module:
module top(
...
    );
...
amiga_clk amiga_clk
        (
          .clk_28(clk_28mhz),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(_cpu_reset))
        );
...
endmodule
Another signal with a similar issue was the halt signal. This signal is also controlled by the userio module, which can halt the cpu on demand.

Again, for this signal we are going to implement a quick fix. The cpuhlt wire in the minimig module we need to assign the value 0.

With these changes, I was able to proceed and I got the following waveforms as result:


Looking at the ram_add signal, we can see addresses 0x3c0000 and 0x3c0001 every time just before the As signal transitions to high. Converting these addresses to byte addresses, we get 0x780000 and 0x780002. This misses the Kickstart ROM address range of 0xf80000 by one bit.

Carefully looking at the Mini Amiga source code, it looks like these addresses are expected. Thinking about this, I realised the reduced start address of kickstart ROM to 0x780000 actually makes sense. If we have kept the address 0xf80000, it is just wasting RAM.

In Summary

In this post we did a quick round of getting the Minimig source code to compile in Vivado, and verified that the ram addresses are as expected.

In the next post we will be moving to the Zybo board, trying to get the the AROS ROM to boot, with the ROM image sitting in SDRAM.

We will be using the AROS ROM instead of the official Kickstart ROM, since the AROS ROM doesn't require a license.

Till next time!
 

Wednesday, 31 March 2021

Running an Amiga core on a Zybo Board: Part 2

Foreword

In the previous post I gave a quick rundown on what I plan for the next couple of posts, which is to get an Amiga core running on a Zybo board.

In this post we will get the fx68k core running on an FPGA, which is an FPGA implementation for the 68000 CPU.

Ironically, the FPGA I will be using for this exercise will not be the Zybo board, but one of my lower spec FPGA boards, the Basys 3, which is also Xilinx based.

The reason for this decision is because in the Zybo board one needs to wrap every core into an IP, which is a bit time consuming, if you are after quick R&D. Once we got to a point where many of the components are running, we can move to the Zybo board.

Simulation issues

As with many FPGA projects, one always start testing your design with simulation.

In the couple of years I was playing with FPGA's, I found that the fx68k was one of the cores I wasn't able to run in a simulator. I am referring here to the simulators that is free of charge. Not sure about commercial ones.

When trying to simulate fx86 within Vivado, it will be stuck in the Elaboration step for hours. Even after 7 hours, Vivado couldn't get passed this step.

When trying to use iVerilog, it didn't understand all the SystemVerilog syntax in the design.

Verilator's support for SystemVerilog is quite good, but in the fx68k design there are some structures, where part of the bits is defined as wires and the rest of the bits is defined as registers. Verilator doesn't like such structures.

So, concerning testing the fx68k core, I will run it on the FPGA itself. This proofed not to be a major issue.

If anyone had any luck in simulating fx68k with a particular simulator, please let me know.

Setting up the project

As mentioned earlier, I will not be using my Zybo board to test the fx68k, but a Basys 3 board, in which you don't need to package all your cores into IP blocks.

We start by creating a top module block:

module top(
   input clk,
    );
...
endmodule

When working with the Basys 3 board, it is important to constraint all the ports of your top level block in an XDC file.

In this case we have an input clock which we constrain as follows:

set_property PACKAGE_PIN W5 [get_ports clk]
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -period 10.000 -name sys_clk_pin -waveform {0.000 5.000} -add [get_ports clk]
Since we work here with a Basys 3 board, pin W5 of the FPGA chip is connected to an external clock clocking at 100MHz.

Now 100MHz is probably a bit too fast for our design, and I am not sure if all timing requirements will be met if we clock the fx68k core that high. So, I will instantiate a slower clock, which will clock around 16MHz.

Vivado supports a clock wizard, which allows you to create a template block, and all instances of this template block will output 16MHz.

We create an instance of this clock within our top module as follows:

module top(
   input btnC,
   input clk,
   output [15:0] led
    );
...
wire clk_16mhz;
...
clk_wiz_0 clk_wiz_0 
 (
  // Clock out ports
  .clk_out1(clk_16mhz),
 // Clock in ports
  .clk_in1(clk)
 )
endmodule

clk_16mhz is the clock signal clocking at 16MHz, and we will use this clock in our design.

Next, let us see how to connect the ports of the fx68k core:

    fx68k fx68k(
        .clk(clk_16mhz),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // These two signals don't need to be registered. They are not async reset.
        .extReset(button_reset),            // External sync reset on emulated system
        .pwrUp(button_reset),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .ASn(As),
        .LDSn(Lds),
        .UDSn(Uds),
        .DTACKn(0),
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(1),
        .IPL1n(1),
        .IPL2n(1),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
        );

There are a couple of ports that we can leave unconnected for now. Let us quickly go through the purpose of some of these ports:

  • extReset and pwrUp: In my design I have linked these ports to a button on the Basys board. Preferably one should pass the button through a debounce core, so you don't have erratic spikes on the reset port.
  • eWRn: This port indicates to memory whether the fx68k want to read or write. Read is indicated with a 1 and a write with a 0.
  • ASn, LDSn, UDSn: These signals are asserted during read/write cycles. We don't really need this for our fx68k, but it helps to see it on a signal trace to see if everything is working correctly.
  • iEdb and oEdb: This is for data in and data out from memory.
  • eab: Address request to memory
Just a note on enPhi1 and enPhi2. In my design I toggle these signals on every clock cycle and it is important that these two signals are always the inverse of each other. So, here is my simple implementation for phi:

    always @(negedge clk_16mhz)
    begin
      phi <= ~phi;
    end

Before ending this section, I would like to make mention of two rom files in the fx68k source, microrom.mem and nanorom.mem. It is important to ensure that the contents of these two files gets loaded into the synthesised design.

If you open up fx68k.sv, you will see that these files are been used by the following modules:

module uRom( input clk, input [UADDR_WIDTH-1:0] microAddr, output logic [UROM_WIDTH-1:0] microOutput);
	reg [UROM_WIDTH-1:0] uRam[ UROM_DEPTH];		
	initial begin
		$readmemb("microrom.mem", uRam);
	end
	
	always_ff @( posedge clk) 
		microOutput <= uRam[ microAddr];
endmodule


module nanoRom( input clk, input [NADDR_WIDTH-1:0] nanoAddr, output logic [NANO_WIDTH-1:0] nanoOutput);
	reg [NANO_WIDTH-1:0] nRam[ NANO_DEPTH];		
	initial begin
		$readmemb("nanorom.mem", nRam);
	end
	
	always_ff @( posedge clk) 
		nanoOutput <= nRam[ nanoAddr];
endmodule
It is important that these files is in a location that is accessible to the synthesising tool. If in doubt, rather use absolute paths.

During synthesis you should also see some messages whether it was able to load these files successfully.

Hello World boot

Now it is time to test the fx68k core. For this purpose, we will writing a very 68000 assembly program:


The assembly language is the column on the far right, and the machine language equivalent is in the middle column.

Before continuing, I would like make mention of the tool I used to get this view. This is an online tool you can get  access via the following url: onlinedisassembler.com/. This tool allows you to disassemble machine code from a variety of CPU's.

Back to the machine language. If one look at the machine code, one will realise that the 68000 is a big endian architecture, that is the most significant parts of a value is stored first in memory. This is in contrast to an architecture like the i386 and ARM processors, which uses little endian.

Now let us look into the test program in more detail:

  • movew #1285,%d0: Here we load the register d0 with the value 1285
  • movew %d0, 0x00858585: Next, we store the value we loaded in the register d0 to memory at location address 0x858585.
  • jsr 0x93e86: Jump to subroutine at memory location 0x93e86.
Somewhere in this assembly language program I have introduced something that is not allowed, and we will see later how the 68000 core handle this scenario.

Now, the next question is, how to we get the fx68k core to execute our program? We need to know what location the 68000 will jump to, when it comes out of reset. Unfortunately this information is not easy to come by.

After quite a bit of searching, I found this information in the M68000 user manual on page 5-29 in the section Reset Operation. I quote:

When RESET and HALT are driven by an external device, the entire system, including the
processor, is reset. Resetting the processor initializes the internal state. The processor
reads the reset vector table entry (address $00000) and loads the contents into the
supervisor stack pointer (SSP). Next, the processor loads the contents of address $00004
(vector table entry 1) into the program counter.

So, in short, we need to ensure that our implementation should return the initial stack pointer when the CPU requests the contents for memory location 0, 1, 2 and 3.

Similarly, our implementation should return the initial value for the program counter when the CPU requests the contents of memory location 4, 5, 6 and 7.

It should be noted that the fx68k access memory in chunks of 16 bits, so addresses posted by the CPU on the address bus we need to multiply by 2 to get the byte address equivalent. The implications of this will become apparent in a moment.

So, let us write some simple logic that feed our program in chunks to the CPU:

  
always @(posedge clk_16mhz)
        begin
            if (add == 16'h0)
            begin
              data_in <= 7;
            end else if (add == 16'h1)
            begin
              data_in <= 16'h9fe7;
            end else if (add == 16'h2)
            begin
              data_in <= 4;
            end else if (add == 16'h3)
            begin
              data_in <= 16'hfe00;
            end else if (add == 20'h27f00)
            begin
              data_in <= 16'h303c; //load immediate
            end else if (add == 20'h27f01)
            begin
              data_in <= 16'h0505;
            end else if (add == 20'h27f02)
            begin
              data_in <= 16'h33c0; //store
            end else if (add == 20'h27f03)
            begin
              data_in <= 16'h0085;
            end else if (add == 20'h27f04)
            begin
              data_in <= 16'h8585;
            end else if(add == 20'h27f05)
            begin
              data_in <= 16'h4eb9;
            end else if (add == 20'h27f06)
            begin
              data_in <= 9;
            end else if (add == 20'h27f07)
            begin
              data_in <= 16'h3e86;
            end
            else begin
              data_in <= 16'h33c0;
            end
        end
As mentioned previously, the address we select by is a word address and not a byte address.

Also, as per the M68000 user guide, location 0 contains the initial value for the stack pointer, which in this case is 0x79fe7. All addresses stored in memory are byte addresses.

Location 2 contains the initial value for our programming counter. As this is a word address, we need to multiply by two to get to the byte address, resolving to 4, which corresponds to the address provided in the user guide. Our initial program counter value in this case is 0x4fe00.

Finally, as seen our snippet of code goes up to word address 0x27f07, and for all other address requests we return the value 0x33c0.

Test Results

Let us have a look at the test results. Below is some waveforms I have captured from my Basys 3 board from the point it got out of reset:





As seen the fx68k core starts off reading word locations 0 - 4, which contains the stack pointer and program counter. The next word location that is read is address 0x27f00. Mutliplying this value by 2 gives 0x4fe00, which matches the value we have provided for the Program Counter. 

If one follows the rest of the waveform, the read_write signal eventually goes low and the value 0x0505 gets written to memory. This is still expected behaviour from our test program.

After the above mentioned write another write kicks off to location 0x3cff2 and then no further reads or writes occurs.

Word address 0x3cff2 (e.g. byte address 0x79fe4) looks like our stack address pointer we have defined, and we could argue that the last instruction executed was a jsr, which will push an address to the stack. However, whatever the processor tried to push to the stack, it failed to finish off what it was trying to do. 

I was wondering if there was something wrong with the initial stack pointer we have provided, so I started playing with different values. One of the values I used as a stack pointer were 0xc033c0.

This time around, using 0xc033c0 as the initial stack pointer, there appeared much more stack operations, looking at address requests. 

Something else that was interesting, was the reading of locations 6 and 7, which seemed like some kind of vector. Referring back to the  M68000 user guide, I found a table on page 6-7 explaining all the vectors. As it turns out, word address 6, which is byte address 12, is an address error. Further on in the user guide, on page 6-19, an Address error is described as:

An address error exception occurs when the processor attempts to access a word or longword operand or an instruction at an odd address. 
And indeed, we were writing the value of register d0 to memory location 0x858585, which was an odd address.

If we change the address to an even number like 0x858586, everything seems to work fine:


The return address gets pushed onto the stack at memory word locations 0x6019de and 0x6019df.

We also see that eventually contents is read from location 0x049f43 onwards, so it seems that our jsr has executed correctly. For this new location our implementation will just return the value 0x33c0.

If we look back at our test program, we see that the value 0x33c0 is actually the opcode for storing the value of the register d0 to memory. This is exactly what is happening towards the end of the waveform.

In Summary

In this post managed to get the fx68k core to run on Basys 3 development board.

In the Test result section, I spend a bit of time unpacking the resulting waveform.

In the next post we will continue our journey to get an Amiga core to run on a Zybo/Basys 3 board.

Till next time!

Monday, 29 March 2021

Running an Amiga core on a Zybo Board: Part 1

Foreword

It has been quite some time since my last post on this blog. Things were pretty busy at work, so I ended up taking quite a long break from writing blog posts.

Recently some of my readers asked whether it was possible to implement an Amiga on an FPGA that provides 20K Logic Elements. The summary of my answer was 'I don't know'.

The Amiga was based on the Motorola 68000 CPU, which in itself consists out of far more components than a 6502. So, one doesn't even know if there would be enough Logic elements left for the rest of the Amiga system.

One of the known Amiga cores can be found in the MiSTer project, here. This project is based on the DE10-Nano development board, which has 110k Logic elements. So, even with this project it was still unclear whether an Amiga core can be accommodated on a 20K FPGA, or even on my 30K element Zybo board.

After about a couple of weeks I started to become interested in implementing an Amiga on an FPGA myself. I never owned an Amiga, so it would be nice to implement it on a Zybo board, to see what I have missed in 90's concerning games 😀.

So, in the next couple of Blog Posts, I will attempt to get an Amiga core from the MiSTer project working on the Zybo board.

Whether I would be able to succeed, I cannot tell, but let us see how far we can get! 😀

Approach

In the coming Blog posts I will basically focus on the Mini Amiga sub project, here, from the MiSTer project.

Perhaps, before I continue, I should just give credit to the people that was involved for producing this core and providing it under the GPL 3 license:

  • Rok Krajnc, providing the project of which the source code of this core is based
  • Dennis van Weeren, Jakub Bednarski, Sascha Boing, A.M. Robinson, Tobias Gubener, Till Harbaum: For various contributions to the Mini Amiga project
  • Sorgelig: Maintainer of the overall MiSTer project
My approach would be to take the various components of the Mini Amiga project, bit by bit, and writing a top level module for each and trying to get it to work on the Zybo board. With such an incremental approach, it would be easier to isolate bugs that pops up.

I will start off with trying to get the CPU to work, by running a very simple 68000 machine language program and verifying that all inputs/outputs of this core are as expected.

The mini Amiga core supports two implementations for a 68000 core: fx68k and tg68k. I have chosen the fx68k core for this blog series. So, also special thanks to Jorge Cwik for providing this core as open source.

In Summary

This post was basically an introduction to the series of Blog Posts that will follow where I will attempt to get an Amiga core to run on a Zybo Board.

My focus will be on the Mini Amiga project and I will follow an incremental approach in an attempt to get the core running on the Zybo board.

In the next post I will try and get the fx68k core running on the Zybo board, using very simple 68000 machine code program as a test.

Till next time!

Monday, 15 June 2020

Gearing up to the newest version of Vivado

Foreword

When I was developing a C64 to run on an FPGA on the Zybo board, I exclusively used Vivado 2017.1.

However, during the course of this blog series, I got feedback from a couple of readers saying that they cannot build the source code from this project on later versions of Vivado.

So, in this post of I thought it would be rather interesting to see how the source code of this project wil behave on the latest version of Vivado.

What is the latest version of Vivado?

I wanted to get hold of the latest version of Vivado so I visited Xilinx's website.

On their website I found that the latest version was 2020.1, but it had no free version.

For the sake of clarity, free versions of Vivado usually have the word WebPack in there name and you will only get a subset of the functionality provided by a full version of Vivado.

Eventually I found that the latest free version of Vivado WebPack is 2019.1. This is the version we will be using in this post.

Issues experienced

When I opened the project in Vivado 2019.1, Vivado did quite a good job upgrading the IP's used in my design to the latest version.

This didn't go without any issues, though.

For a start, I got a number of issues with the two Clock wizard blocks in my design. One of the issues I was having, was that one of these blocks outputting two clock frequencies, ended off outputting only one frequency in Vivado 2019.1

In the the end I just deleted the two clock wizard blocks, created new ones and wire them up again.

Another minor issue experienced was that the Zynq block didn't had the I2C port 1 enabled, which I use to enable sound on the Zybo board. For this I also had to enable I2C port and wire up the ports.

One final issue I had, was when generating the Bitstream. The IDE complained about one of the AXI ports not been connected, due to optimisation.

I can remember I has a similar issue right in the beginning of the series when I was playing with AXI ports. At that time the issue I was having was with port AXI_ARID. The way I fixed that issue was just to wire AXI_ARID to a logic zero.

In Vivado 2019.1, the issue was the port AXI_AWID. Two very similar ports. The former port specifies the ID for an AXI read transaction, and the latter is for a ID for a write transaction. It is just funny that Vivado 2017.1 also didn't complain at the time about the AXI_AWID port.

Anyway, I fixed the AXI_AWID error by also wiring it to a logic 0.

Building the source in Vivado 2019.1

I have made the changes available on my GitHub repo for this project, which will enable you to build the project with Vivado 2019.1. For the record, here is the link again to the project repo on GitHub:


I wanted to retain the code that enables one to build the source on Vivado 2017.1, so I made the 2019.1 changes available in a branch. So, just a summary of the branches:

  • Master: Will build on Vivado 2017.1
  • Branch v2019.1: This branch will build on Vivado 2019.1.
When building the code on Vivado 2019.1, you might experience some minor issues. So, in this section I will discuss these issues, together with how to get around them.

Before running Synthesise, it is advisable to create a new HDL wrapper. In doing this, Vivado will complain about some IP's that are locked. Vivado will alternatively suggest to run Check IP Report Status.

So, under the the Reports menu select IP Report Status. From the tab that opens up select Upgrade selected. Eventually a couple of processes will start that will generate all the missing sources, after which you should be able to synthesise and generate bitstream.

In Summary

In this post I discussed how the source code of our C64 FPGA implementation currently behaves in Vivado 2019.1 and the changes required to make it work.

Till next time!

Wednesday, 29 April 2020

Initialising the Sound System

Foreword

In the previous post we added joystick support to our C64 FPGA from the Linux operating system. In addition we expanded the system to be able to use either Joystick Port #1 or Joystick Port #2.

In this post we will I show how to initialise the sound chip on the Zybo board from Linux, so that we can hear the sounds from the SID module.

Speaking of the sound system. A couple of posts ago I mentioned that I have updated my Github repo with the recent changes of our C64 core module, excluding the sound system.

I am pleased to announce that my Github repo now also contains the addition of the sound system.

Rendering SID samples to a speaker

Some time ago, I wote two posts, here and here, where I explained how to incorporate Thomas Kindler's SID core into our C64 design.

My discussion in these two posts basically stopped at the point where the SID samples was serialised over the I2S bus. I haven't explained the supporting processes at all that needs to happen so that the sounds finally gets rendered on a speaker.

Having an overview of these processes is necessary to understand the steps required for initialising the Sound System. So, let's get started!

Sound is produced and captured on the Zybo board by means of the SSM2603 chip from ANALOG Devices.

This chip receives sound samples via the I2S bus and is configured via a I2C bus. As previously mentioned we have already implemented the I2S bus for sending Audio Samples.

We haven't, however, discussed the implementation to interface with the I2C bus. Luckily one doesn't need to implement a I2C module from scratch, since the Zynq contains two I2C onchip peripherals.

In the next section I will briefly highlight what is required to use one of these onchip I2C peripherals to initialise the SSM2603.

The question is, however, how do we use a onchip I2C pheriperhal in our design? We will cover this in the next section.

Using a onchip I2C peripheral

To surface a I2C peripheral in our design, we need to configure our Zynq block. So, start by double clikcing on the Zynq Block, selecting the MIO Configuration section and opening I/O peripherals.

There is a couple of things you should do here. First you need to select the option I2C 1. With this option selected a drop down will appear next to this option in the IO column.

In this dropdown in the IO column you need to select to which external pins on the ZYNQ this I2C peripheral should be attached to. The first couple of options are MIO pins. If we select any of these we will not see the I2C pins in our block design at all. The only option that will surface the pins in our design is EMIO:



With this option selected the I2C pins will now appear within our design in the Zynq block:

You will see that I have already linked up these pins to the rest of the diagram. the iobuf block is a custom block I have created have pin as a birectional pin. The direction of this pin is controlled by the tristate input.

This is all there is to wiring up these pins. Next, let us see what software changes is required to drive these pins and to ultimately initialising the SSM2603.

Initialising SSM2603 from Linux

When I originally developed the SID functionality in the C64 core, I initialised the SSM2603 in a Bare-Metal application. Here is the source for the Bare-Metal application: https://github.com/ovalcode/c64fpga/blob/master/SDK.src/standalone/c64.c

There is quite a bit going on in this application and apart from initialising the SSM2603, we are also reading from a USB keyboard, as explained in a previous post. The key method to look for is init_sound().

If you follow the code in init_sound(), you will see that we are directly writing to the registers associated with the I2C 1 controller. Since we are working in Linux, at the moment, it is probably better practice to see if one can open I2C 1 as a device file and then manipulate the SSM2602 with this file.

In approaching this problem with accessing I2C 1 as a device file, I found myself trying to fiddle with the device tree to try and enable the I2C driver.  This ended up been quite a mission, so I reverted, at least for now, to copy the register writes/read from the standalone application and just modifying it a bit, so it can work in Linux. The whole SSM2603 init sequence we will also make part of the Kernel driver we have been developing in the last couple of posts.

As we have seen in previous posts, when you want to access physical memory locations in Linux, you first need to map it to a virtual address space. So, let us do it for the I2C 1 registers:

static void __iomem *i2c_reg;

static int __init ebbchar_init(void){
...
c64_reg = ioremap(0x43c00000,
           16000);
c64_reg = c64_reg;
c64_reg_screen_mode = c64_reg + 8;
c64_reg_keyboard_0 = c64_reg;
c64_reg_keyboard_1 = c64_reg + 4;
tape_mem_area =  ioremap(0x1f500000,
           2000000);

i2c_reg = ioremap(0xE0005000, 128);
...
}

The address returned by ioremap is an address in virtual address space. From this point onward when you want to access of the I2C registers, you need to use i2c_reg as your base and then add your register offset.

So, for instance, if you want to write the value 0x1f to the register 0x1c, you will do the following:

iowrite32(0x1f, i2c_reg+0x1c);

Similarly, if you want to read a register, you will do something like the following:

status = ioread32(i2c_reg+0x10) & 1;

You will see that in the standalone code, we are using two different operators for doing a read and write from a register: Xil_In32 and Xil_Out32. So when using this code in Linux remember to convert it to ioread32 and iowrite32 respectively. Also, remember that the parameter order of iowrite32 is different than that Xil_Out32: In Linux it is value followed by address and Xil_Out32  starts with the address, followed by the value.

If you want find the final source for the Linux Kernel driver, just go here: https://github.com/ovalcode/c64fpga/tree/master/SDK.src/linux

Another fine difference between the standalone code and the Linux code, is the operator we use for introducing a delay. In the standalone code I use usleep, where the value should be microseconds. Also, in Linux I am using msleep, where the value should be in milliseconds.

This is about all the changes required to initialise the SSM2603 from Linux.

When equivalent isn't exactly equivalent

In the previous post I have basically copy and pasted the SSM2603 initialisation code to our Linux Kernel driver, with minor changes.  So, in theory, this code should just worked in our Linux Driver.

However, when I tried this code for the first time, the SSM2603 didn't initialise at all.

After some investigation, I found that the driver got stuck in the following loop within the method writeReg:

         do {
          status = ioread32(i2c_reg+0x10) & 1;
         } while (!status);

This loop checks one of the status bits of I2C 1, which is set as soon as the I2C have shifted out a chunk of information.

This really confused me, because the equivalent code on the Bare-Metal application worked perfectly.  I tried thinking of a couple of possibilities that was causing this.

Firstly I wanted to know if the pins from I2C really got routed to EMIO instead of one or other MIO pin.

So armed, with the XSCT console, I inspected the memory locations for MIO routing and comparing the same registers to when Linux was running.

Maybe I should just take a step back and give an some background to the checks I was doing.

The MIO registers provides you with some options to route the I2C pins to different external pins. As an example, please have a look on page 1643 of the Zynq Technical Reference manual.

This register gives you the option to route the Serial Clock pin of the I2C 1 to pin 12. Similar registers exists, like 0xF8000734 and 0xF8000740, which allows you to route I2C pins to other MIO pins.

When I inspect these registers when running the Bare-Metal application, I found that no MIO pin is configured for surfacing any I2C 1 pin. This is exactly what I expect.

When inspecting the same registers when running Linux, I got the same result.

This inspection let to a bit of a dead-end, but there is still a question remaining: How do enable pins to get routed through EMIO?

At first sight I couldn't find any information about this in the Zynq TRM. However, the diagram on page 49 provided some subtle information for me.

As seen on that diagram, all peripherals go into the MIO, which performs the necessary multiplexing as describe earlier.

Some of these peripherals are also connected to the EMIO. From the diagram it looks like the connections to EMIO are direct, so in theory these pins should always be available to the FPGA.

Once again I reached a dead-end. What else could be the reason why the the SSM2603 doesn't initialise in Linux?

I did a further search in the Zynq TRM PDF to see if I could locate any other registers that is related to the I2C peripheral, and eventually I came across the register APER_CLK_CTRL on page 1586. Specifically, my eyes ctached the following phrase:

Please note that these clocks must be enabled if you want to read from the peripheral register space.
This could be indeed the problem, we are trying to read the status, but we are not getting any sensible data back, because the AMBA clock is not enabled for I2C 1. For clocking to be enabled for I2C 1, bit 19 should be set for register APER_CLK_CTRL.

I could confirm that this bit was set when the Bare-Metal application was running, and, as a relief, I could confirm that bit wasn't set when running Linux!

This is luckily an easy fix!

...
static void __iomem *clk_reg;
...
static int __init ebbchar_init(void){
...
clk_reg = ioremap(0xF8000000, 0x200);
...
iowrite32(0x01EC044D, clk_reg+0x12c);
msleep(2);
init_sound();
...
}
...

I introduced a small delay after setting the bit in the register APER_CLK_CTRL.

After the fix the SSM2603 initialisation worked in Linux. One can quickly detect that the initialisation is working by hearing the speaker attached to Zybo board turning on.

In Summary

In this post we looked into how to initialise the SSM2603 for sound in Linux.

This set of blog posts on how to create a C64 on a Zybo is quickly running to an end, since I have achieved more or less all my goals with it.

That been said, there is so much more you can use the Zybo board for, some of these of which I also want to write Blog posts about.

I also want to write some posts about using the C64 outside of a FPGA context, like writing a Chess Engine.

So, in coming posts I will introduce some variety, that might not be related to implementing a C64 on on FPGA.

I might, however, revist the topic of C64 on an FPGA from time to time, for instance implementing a 1541 running in parallel with the C64 in the FPGA.

Till next time!

Tuesday, 14 April 2020

Mapping joystick bits from Linux

Foreword

Previously we managed to load a tape image from the Linux File system, and got our C64 core to load it.

In this post we will add some more key mappings for a joystick, so we can play the game that loads.

Making our C64 design accept two joystick inputs

Currently our C64 FPGA design have only implemented Joystick Port #2. Having implemented functionality to rapidly switch between different game tape images, it actually becomes more of a requiremnt to implement Joystick Port #1 as well.

So, let us start with this requirement by looking into the C64 FPGA design.

Our starting point is our IP block that contains both an Master and Slave AXI port. This block we need to modify so that it can get an additional output port for Joystick Port #1:


To wire up this port, we need to make the following adjustments to the user logic of our AXI Slave block:

 // Add user logic here
    assign slave_reg_0 = slv_reg0;
    assign slave_reg_1 = slv_reg1;
    assign restart = slv_reg2[1];
    assign tape_button = slv_reg2[2];
    assign joybits = slv_reg2[8:4];
    assign c64_mode = slv_reg2[9];
    assign joybits2 = slv_reg2[14:10];

 // User logic ends

The bits of the two joystick ports are not consecutive to each other, due to the c64_mode in between. This will be a source of interesting bit manipulation in our driver, as will be seen later.

On our C64 core IP, we need to add an extra input port to accept this extra Joystick input. In our C64 core we supply the bits of the second joystick port to port B of CIA#1.

Port B of CIA#1, however, is also used as an from our keyboard. Thus we need to merge the lower four bits of the keyboard input bits with the bits from our second joystick port.

We do this as follows:

...
    assign key_joy_merged = ~(~keyboard_result[4:0] | ~joybits2);
...     
    cia cia_1(
          .port_a_out(keyboard_control),
          .port_a_in({3'b111, joybits}),
          .port_b_in({keyboard_result[7:5],key_joy_merged}),
          .addr(addr[3:0]),
          .we(we & io_enabled & (addr[15:8] == 8'hdc)),
          .clk(clk_1_mhz),
          .chip_select(addr[15:8] == 8'hdc & io_enabled),
          .data_in(ram_in),
          .data_out(cia_1_data_out),
          .flag1(flag1 & !flag1_delayed),
          .irq(irq)
            );
...

This is all the changes we require for the second joystick port for our FPGA design

Changes to the driver

Let us see what changes is required to our driver to accommodate two joystick ports.

Firstly we need to modify our record structure for input events as follows:

struct keyboard 
{
           u32 word1;
           u32 word2;
           u32 joybits;
};


Joybits will store the bits for both joystick ports.

Next, I will make a couple of changes to our write method:

static ssize_t dev_write(struct file *filep, const char * keys, size_t len, loff_t *offset){
   struct keyboard temp[1];
   copy_from_user(temp, keys, 12);
   iowrite32(temp[0].word1, c64_reg_keyboard_0);
   iowrite32(temp[0].word2, c64_reg_keyboard_1);
   unsigned int tempjoy = temp[0].joybits & 0x3ff;
   tempjoy = ~tempjoy & 0x3ff;
   unsigned int joy_high = (tempjoy << 1) & 0x7c0;
   unsigned int joy_low = tempjoy & 0x1f;
   joy_high = joy_high | joy_low;
   joy_high = joy_high << 4;
   unsigned int screenread = ioread32(c64_reg_screen_mode) & 0xffff820f;
   screenread = screenread | joy_high;
   iowrite32(screenread, c64_reg_screen_mode);
   
   return 12;
}


So, we isolate the bits of both ports and shift them to the correct position. We also need to shift the data of the two joystick ports one bit apart, because they are separated by the c64_mode bit.

Changes to the application

In our application, we make use of a file called sym.txt to indicate the resulting c64 scancode for each mapped key.

The real question here is: How do we indicate that a set of mapped keys is purposed for a Joystick? For this purpose, I am going to use the value -3:

84 -3 0 0                                            
17 -3 1 0                                              
87 -3 2 0                                              
89 -3 3 0                                              
19 -3 4 0                                              

The third value indicates the relevant bit that should be set for the joystick.

You might also remember that we use the file sym.txt to populate an array called key_map, containing c64 scancode, where the row and column values are reduced to a single scan code value between the range 0 to 63.

For the joystick, we can just extend the range, so scancode 64 can be joystick bit 0, 65 joystick bit 1 and so on.

With this in mind, we can change the method init_table as follows:

void init_table() {
  for (int i = 0; i < 256; i++) {
    key_map[i][0] = -1;
    key_map[i][1] = -1;
    key_map[i][2] = -1;
    key_map[i][3] = -1;
  }
  FILE *fp;
  fp = fopen("sym.txt", "r");
  char input[80];
  int num1, num2, num3, num4;
  while (1) {
    int status = fscanf(fp,"%d", &num1);
    fscanf(fp,"%d", &num2);
    fscanf(fp,"%d", &num3);
    fscanf(fp,"%d", &num4);
    fscanf(fp,"%[^\n]",input);
    if (status == EOF)
      break;
    if (num2 == -3) {
      key_map[num1][0] = num3 + 64;
      key_map[num1][1] = 0;
    } else if (num4 == 8) {
      key_map[num1][0] = (num2 << 3) | num3;
      key_map[num1][1] = 0;
      key_map[num1][2] = (num2 << 3) | num3;
      key_map[num1][3] = 1;
    } else if (num4 == 1) {
      key_map[num1][0] = (num2 << 3) | num3;
      key_map[num1][1] = 1;
    } else if (num4 & 32) {
      key_map[num1][0] = (num2 << 3) | num3;
      key_map[num1][1] = num4 & 1;
      fscanf(fp,"%d",&num1);
      fscanf(fp,"%d",&num2);
      fscanf(fp,"%d",&num3);
      fscanf(fp,"%d",&num4);
      fscanf(fp,"%[^\n]",input);
      key_map[num1][2] = (num2 << 3) | num3;
      key_map[num1][3] = num4 & 1;
    }
  }
}


We handle these codes in the key processing loop as follows:

    for (int i = 0; i < num_keys_to_process; i++) {
      int offset = shifted << 1;
      if (keys_to_process[i] == 0x18) {
        ioctl(fd,3);
        continue;
      }
      int c64_scan_code = key_map[keys_to_process[i] & 0xff][offset];
      if (c64_scan_code == -1)
        continue;
      if (c64_scan_code > 63) {
        keyToProcess.joybits = keyToProcess.joybits | (1 << (c64_scan_code - 64));
      } else if (c64_scan_code < 32) {
        keyToProcess.word1 = keyToProcess.word1 | (1 << c64_scan_code);
      } else {
        c64_scan_code = c64_scan_code - 32;        
        keyToProcess.word2 = keyToProcess.word2 | (1 << c64_scan_code);
      }
      if (key_map[keys_to_process[i] & 0xff][offset+1]) {
        keyToProcess.word1 = keyToProcess.word1 | (1<<15);
      }
    }

How do we switch between different joystick ports? A very simple mechanism would be to specify an extra parameter on the commandline. We can then just test for the number of arguments:

    for (int i = 0; i < num_keys_to_process; i++) {
      int offset = shifted << 1;
      if (keys_to_process[i] == 0x18) {
        ioctl(fd,3);
        continue;
      }
      int c64_scan_code = key_map[keys_to_process[i] & 0xff][offset];
      if (c64_scan_code == -1)
        continue;
      if (c64_scan_code > 63) {
        if (argc == 3)
          c64_scan_code = c64_scan_code + 5;
        keyToProcess.joybits = keyToProcess.joybits | (1 << (c64_scan_code - 64));
      } else if (c64_scan_code < 32) {
        keyToProcess.word1 = keyToProcess.word1 | (1 << c64_scan_code);
      } else {
        c64_scan_code = c64_scan_code - 32;        
        keyToProcess.word2 = keyToProcess.word2 | (1 << c64_scan_code);
      }
      if (key_map[keys_to_process[i] & 0xff][offset+1]) {
        keyToProcess.word1 = keyToProcess.word1 | (1<<15);
      }
    }

This conclude the code changes we should make to implement joystick ports.

In Summary

In this post we have implemented the necessary code changes for mapping keys to joystick ports in Linux.

In the next post we will write some extra code for our driver to initialise the sound system  so we can hear sound generated by our C64 core!

Till next time!