Saturday, 1 March 2025

A Commodore 64 Emulator in Flutter: Part 10

Foreword

In the previous post we implemented the last couple of 6502 instructions in our C64 Flutter emulator.

In this post we will be running the Klaus Dormann Test Suite on our emulator to ensure we have implemented all the instructions correctly.

Starting up the Klaus Dormann Test Suite

Let us see if we can startup the Klaus Dormann Test Suite on our emulator, although only in a single stepping fashion at the moment.

To get started, we need two files from Klaus' Github repository:

The first link is the actually binary which will execute in our emulator. This is a 64KB binary which will fill the whole address space accessible by the 6502.

The second file is a listing file, containing the actual disassembled version of the binary we are running. The listing file is useful if you want to follow along to see what the program is actually doing in a certain point in time.

Firstly we dump the binary in the assets folder of our Flutter project and rename it to program.bin. This is the default binary our emulator looks for when it starts up.

Now, usually if a 6502 system starts up, it looks at the reset vector at address 0xFFFC and 0xFFFD for the starting address for which it should start executing code, something which we didn't implemented yet.

In the Klaus Test suite there is also a reset vector defined, but within the context of the Test Suite it has the function to detect if an accidental reset was triggered. So, in actual fact this Test Suite doesn't use the reset vector to everything. Rather, when using the test suite, you should just set the PC register to 0x400 and start execution. This makes our life easier, and for the moment we don't need to worry about implementing the Reset vector stuff.

So, in put cpu.dart, the following change needs to be done, change in bold:

 
...
  int _n = 0, _z = 0, _c = 0, _i = 0, _d = 0, _v = 0;
  int _sp = 0xff;
  int pc = 0x400;
...
With this we can startup our emulator and single step through the code of Test Suite.

Unattended running

To single step through the Klaus Dormann Test Suite in our emulator will be such a daunting tasks. You will probably need to click the step button thousands of times.

It would make our lives easier if we could just let the Test Suite run unattended, with us just pausing the execution once in while, to see how far we have progressed through the tests.

We do this by adding a button right next to the title. As part of the process we need to wrap both the title and the button in a row in order for everything to align properly. All this is happening in the main.dart file:

        appBar: AppBar(
          title:  Row(
              mainAxisSize: MainAxisSize.min,
              children: [
                const Text("Emulator C64"),
                BlocBuilder<C64Bloc, C64State>(
                  builder: (BuildContext context, state) {
                    return Row(
                        mainAxisAlignment: MainAxisAlignment.end,
                        children: [
                          _getRunStopButton(state, context)
                        ]);
                  })
              ]),
        ),
We want our run button to behave like a toggle switch, toggling between a play and a pause button. To do all these fancy stuff, we need to inject some state, which we achieve by wrapping everything with a BlocBuilder. We did discuss the workings of BlocBuilder in a previous post.

Now, the method _getRunStopButton() returns for us three possible buttons, depending on the state, which could be a play button, a stop button, and a disabled play button if everything hasn't initialised yet:

  Widget _getRunStopButton(C64State state, BuildContext context) {
    if(state is DataShowState) {
      return IconButton(
        icon: Icon(Icons.play_arrow),
        onPressed:  () {
          context.read<C64Bloc>().add(RunEvent());
        }
      );
    } else if (state is RunningState) {
      return IconButton(
          icon: Icon(Icons.stop_circle),
          onPressed:  () {
            context.read<C64Bloc>().add(StopEvent());
          }
      );
    } else {
      return const IconButton(
          icon: Icon(Icons.play_arrow),
          onPressed:  null
      );
    }
  }

Here we test for different states. Firstly we show an enabled play button if we are in DataShowState. As you might remember from previous posts, with DataShowState, we display a dump of memory and registers, and we can single step from that point. This is the perfect scenario to provide a play button that will run the emulator at full speed.

Pressing the play button emits a RunEvent, which we still need to implement a listener for. We will do that in a bit.

Secondly if our emulator is in the RunningState, we display the stop button. We should still implement the RunningState State, which in actual fact is a very simple implementation:

class RunningState extends C64State {}
No values or properties we need to convey to here, just conveying the mere fact when we are in the running state.

Finally, for any other state we just want to show a play button that is disabled. This will only happen when our Application is loading up and loading the memory image, which is our case is the Test Suite.

Now, we have defined a number of events that we need to listen for in c64_bloc.dart.

Firstly, let us define the listener for RunEvent. This will be the core of our unattended running. Here we want schedule a timer that runs every second and then we also execute a second worth of CPU instructions (aka 1 000 000 CPU cycles). We need to emit a RunningState state so our front end can update accordingly.

Let us start with an outline:

...
  Timer? timer;
...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
...
      });
      emit(RunningState());
    });
...
We define the timer variable as a global variable in our C64Bloc class, since we want to be able to cancel the timer in another event handler.

Now, to determine when our CPU has executed 1 million cycles worth of instructions, our CPU needs to keep record of the cycles for each of the instructions it executes. This is obviously in the step method:

...
  int _cycles = 0;
...
  int getCycles() {
    return _cycles;
  }
...
  step() {
...
    _cycles = _cycles + CpuTables.instructionCycles[opCode];
    var resolvedAddress =
        calculateEffectiveAddress(CpuTables.addressModes[opCode], arg0, arg1);
    switch (opCode) {
...
    }
  }
We have defined the instructionCycles array in a previous post, which specify the number of cycles for every opcode. So, with every step we can just add the number of cycles for the opcode being executed to a _cyles variable.

With this implemented, we can add some meat to our timer callback function:

...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
          int targetCycles = _cpu.getCycles() + 1000000;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
      });
      emit(RunningState());
    });
...
So, we just add one million to our current Cpu cycle count and that will be the target at which we will stop the loop.

Finally, we need to implement the stop event:

    on<StopEvent>((event, emit) {
      timer?.cancel();
      emit(DataShowState(
          dumpNo: dumpNo++,
          memorySnippet: ByteData.sublistView(memory.getDebugSnippet(), 0, 512),
          a: _cpu.getAcc(),
          x: _cpu.getX(),
          y: _cpu.getY(),
          n: _cpu.getN() == 1,
          z: _cpu.getZ() == 1,
          c: _cpu.getC() == 1,
          i: _cpu.getI() == 1,
          d: _cpu.getD() == 1,
          v: _cpu.getV() == 1,

          pc: _cpu.pc));
    });

Here we just cancel the timer emit a DataShowState, so after we have stopped the running, we want to display the current state of memory and the registers.

When the emulator runs unattended, we also want to hide the state display to avoid confusion and just show "running". To keep the discussion focused, I will not be going into this detail.

Running the Test Suite

Finally we are at a point where we can run Klaus Dormann's Test Suite. On startup, the screen look like this:

As dicussed, the play button to start the emulator in unattended mode is next to the title.

When clicking play, the screen changes like this:

One weird thing you might notice, is if you click run and quickly stop again, you will see the Program counter is still at 0x400, the starting address of the test suite. As if nothing executed. The reason for this is very subtle. Our timer callback will only execute if the timer lapsed. So, in our case we need to wait at least 1 second to expect some results before clicking the stop button.

So, if we let it run a bit longer, our result will look like this:


So, when stopped our Program counter was at 0x9D7. Funny thing is, you can let it run for long as you want to, but the Program counter remains stuck at 0x9D7.

What is going on here?

To find the answer we need to look at the source listing of the Test Suit and search for that address:


Here it is clear, if something went wrong with the test, it will do an endless loop at the address 09D7. So, obviously, our emulator failed test, but which one? Look back a couple of lines, we see the comment: The IRQ vector was never executed

Aha! We never implemented IRQ's (Interrupt Requests) in our emulator. Having said that, it briefly caught me in a mystical moment, almost like as a kid and playing on a Commodore 64, I wondered for the first time what was going on underneath the hood.

In this case I wondered where the IRQ came from. This Test Suite doesn't implement any magical peripherals? After a moment I realise that this was probably caused by me not implementing the BRK instruction, and looking further back in the listing did confirm this.

This was actually a very interesting experience for me. It was the first time I encountered a problem, and my first instinct is moment of nostalgia 😂

In the following section we will implement the BRK instruction and then run the emulator again.

Implementing the BRK and RTI instructions

So, let us quickly implement the BRK and RTI instructions. There is one caveat with the BRK instruction. It is a one byte instruction, but in actual fact it behaves like a 2 byte instruction. The BRK triggers an IRQ and when it returns it doesnt return to the address directly after the BRK instruction, but one address further on.

To account for this quirk of the BRK instruction, we can adjust the instruction length in the instructionLen table for the BRK instruction to 2.

With the table adjusted, we implement the BRK and RTI instruction as follows:

      /*BRK*/
      case 0x00:
        push(pc >> 8);
        push(pc & 0xff);
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
        _i = 1;
        pc = (memory.getMem(0xffff) << 8) | memory.getMem(0xfffe);

      /*RTI*/
      case 0x40:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;
        pc = pull() | (pull() << 8);
Now, when we run the test suite again, we get passed this failed test suite. However, we end up in another endless loop at address 0xdeb, which indicates another failed test.

We will investigate this failed case, as well as other potential failed cases in the next post.

In Summary

In this post we ran the Klaus Dormann Test Suite on our Emulator in unattended mode. The first failed test case we encountered was the BRK/RTI instruction that wasn't implemented.

With the BRK/RTI instruction implemented we encountered another failed test case which we will investigate in the next post, as well as other potential failed test cases which will pop up.

You can find all the source code for this project as well as the binary image containing the Klaus Dormann test suite, here.

Until next time!

Sunday, 23 February 2025

A Commodore 64 Emulator in Flutter: Part 9

Foreword

In the previous post we implemented all stack operations for our Flutter C64 emulator. This included pushing and popping the Accumulator and the status register. Also, we implemented the JSR/RTS instructions, which also operates on the stack.

In this post we will be implementing the remaining instructions of the 6502, which includes the following:

  • BIT
  • JMP (Jump)
  • NOP
  • Register operations
With these instructions implemented, we can start in the next post to run the Klaus Dormann Test suite on our emulator to see if we have implemented all the 6502 instructions correctly.

Enjoy!

The Jump instruction

Implementing the jump instruction is just a straight forward operation of loading the program counter with a new value. Let us add the selectors for these:

        /*
JMP (JuMP)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Absolute      JMP $5597     $4C  3   3
Indirect      JMP ($5597)   $6C  3   5
         */
      case 0x4C:
      case 0x6C:
        pc = resolvedAddress;

Now, there is two address modes for this instruction: Absolute and Indirect. The absolute address mode we have already implimented in the calculateEffectiveAddress() method, but not the Indirect Address mode. So, within the calculateEffectiveAddress() method, let us add the following selector:

      case AddressMode.indirect:
        var lookupAddress = (operand2 << 8) | operand1;
        return memory.getMem(lookupAddress) | (memory.getMem(lookupAddress + 1) << 8);

The BIT instruction

Next, let us implement the BIT instruction. From the specs, the BIT instruction is defined as follows:

BIT (test BITs)
Affects Flags: N V Z

MODE           SYNTAX       HEX LEN TIM
Zero Page     BIT $44       $24  2   3
Absolute      BIT $4400     $2C  3   4

BIT sets the Z flag as though the value in the address tested were ANDed with the 
accumulator. The N and V flags are set to match bits 7 and 6 respectively in the 
value stored at the tested address.
We implement this as follows:

      case 0x24:
      case 0x2C:
        int memByte = memory.getMem(resolvedAddress);
        _z = ((memByte & _a) == 0) ? 1 : 0;
        _n = ((memByte & 0x80) != 0) ? 1 :0;
        _v = ((memByte & 0x40) != 0) ? 1 :0;

The NOP instruction

The NOP instruction is the short for No Operation. It literally does nothing except for consuming CPU cycles. One of the major uses of this instruction is to reserve some slots in memory where in future you might want to add some more instructions.

Strictly speaking you don't need to implement a case selector for this instruction in our big switch statement decoding the different op opcodes. The surrounding mechanism should just skip to the next instruction.

However, by not implementing a selector for NOP, the default selector will be invoked in the switch statement. The default selector is nice to warn us if we forgot to implement some instructions or we encountered some undocumented instructions in the code. By not giving NOP a selector we will get many false positives by hitting the default selector.

So, the selector for NOP will look as follows:

        /*
NOP (No OPeration)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Implied       NOP           $EA  1   2
         */
      case 0xEA:
        break;
With the Dart language we don't need the break in general. However, when you have blank case like this you will need to add it, otherwise it will fall through to the next case statement with code, which is not what we want.

Register operations

Finally, let us implement the register operations. As per the specs, these are the following Instructions:

Register Instructions
Affect Flags: N Z

These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                 HEX
TAX (Transfer A to X)    $AA
TXA (Transfer X to A)    $8A
DEX (DEcrement X)        $CA
INX (INcrement X)        $E8
TAY (Transfer A to Y)    $A8
TYA (Transfer Y to A)    $98
DEY (DEcrement Y)        $88
INY (INcrement Y)        $C8
In previous posts we did implement some of these. Doing some inventory, I found that the following still needs to be implemented:
  • TAX
  • TXA
  • INX
  • TAY
  • TYA
  • INY
Here is the implementation:

      case 0xAA:
        _x = _a;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0x8A:
        _a = _x;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xE8:
        _x++;
        _x = _x & 0xff;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0xA8:
        _y = _a;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

      case 0x98:
        _a = _y;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xC8:
        _y++;
        _y = _y & 0xff;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

This covers the instructions we wanted to implement in this post. I am not going to write a test program in this post to test the instructions we have implemented, since in the next post we will start to run the Klaus Dormann Test Suite, which will anyway surface any defects.

In Summary

In this post we implemented the remaining instructions for our emulator.

In the next post we run the Klaus Dormann Test Suite on our Emulator to see if we have some defects in our implementation. This will probably go over to multiple posts depending on how many issues we detect.

Until next time!

Thursday, 20 February 2025

A Commodore 64 Emulator in Flutter: Part 8

Foreword

In the previous we implemented the compare and branch instructions for our Flutter C64 Emulator.

In this post we will implement the 6502 stack and related operations like pushing/ popping, Jump to subroutine and Return from subroutine.

Enjoy!

The stack concept

The stack is a Last in First out (LIFO) data structure. To visualise a stack in real life, one can look at a receipt stack:


Clearly, one can see that the receipt that is most accessible is the last receipt you have placed on top of the pile.

In a CPU the stack has many uses, like if you were calling subroutines in a nested way, and you want to return to the caller of a subroutine. A stack is perfect for this, because you want access to the last return address.

On the 6502, the stack is 256 bytes in size and lives in page 1 of the memory space. That is the address range $100 - $1ff. On the 6502 the stack grows downwards starting at $1ff, growing down towards $100. Obvious as you pop stuff off the stack it goes back towards $1FF.

On the 6502 the stack has many uses, of which we already mentioned jumping and returning from sub routines. You can also push and pop registers. The 6502 also uses the stack when serving interrupts. Before an interrupt routine is called it stores the state of the CPU on the stack, so if the service routine is finished, it restores the CPU to the state before the CPU was interrupted, and the program continues as if nothing has happened.

Creating the stack mechanism

Let us start by writing some code for implementing the stack mechanism. We start by defining a stack pointer:

int _sp = 0xff;
We start with the initial value of 0x1ff, which is the starting poisition of the stack. We omit the high byte value of 1, and will just prepend it if we need to do any lookups in memory.

Now let us create some push and pop instructions.

  push(int value) {
    memory.setMem(value, _sp | 0x100);
    _sp--;
    _sp = _sp & 0xff;
  }

  int pull() {
    _sp++;
    _sp = _sp & 0xff;
    return memory.getMem(_sp | 0x100);
  }

With the understanding that the stackpointer points to the location where the push will happen, we can use the stackpointer address as is when storing the value of the push and then decrement the pointer thereafter.

However, since the pointer points to the next push location, you cannot use the location as is when doing a pull. You first need to increment the pointer and use that value for the read address. 

Before ending this section, let us see if we can implement the basic stack instructions Push accumulator(PHA) and Pull Accumulator(PLA) to see if our stack implementation behaves as expected.

    /*
    PHA (PusH Accumulator)          $48  3
    */
      case 0x48:
        push(_a);
    /*
    PLA (PuLl Accumulator)          $68  4
    */
      case 0x68:
        _a = pull();
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

Implementing JSR and RTS

Let us now implement the JSR (Jump to Subroutine) and RTS (Return from Subroutine) instructions.

So, in principle when the JSR executes, it pushes the address of the next instruction on the stack as the return address before jumping to the subroutine. When the subroutine finishes executing and invoke RTS, it pulls this address again of the stack and jump to it.

However, there is a small caveat with this sequence of events. The return address pushed onto the stack is not exactly the return address of the next instruction, but the address of the next instruction -1.

This way of operation of the JSR, the designers of 6502 implemented as a kind of an optimisation. When reading instructions from memory the program counter is incremented by 1 each time, and by the time it needs to push the return address the PC is still pointing to the last byte of the JSR instruction.

Now, if were to implement the JSR/RTS in your emulator with the assumption that the value pushed on the stack is purely the address of the next instruction, without worrying about the -1 stuff, you emulator would probably work fine 99% of the time. That been said, however, I did encounter some magic 6502 code in the past that interrogate the contents of the stack for implementing stuff like copy protection or auto-starting code. In such cases, your emulator might not work correctly with such code if your emulate the JSR instruction doesn't push adresses on the stack following the -1 convention.

So, it is important to adhere to this convention when implementing the JSR/RTS instructions.

Here is the implementation of these two instructions:

/*
MODE           SYNTAX       HEX LEN TIM
Absolute      JSR $5597     $20  3   6
 */
      case 0x20:
        int temp = (pc - 1) & 0xffff;
        push(temp >> 8);
        push(temp & 0xff);
        pc = resolvedAddress;
/*
MODE           SYNTAX       HEX LEN TIM
Implied       RTS           $60  1   6
 */
      case 0x60:
        pc = pull();
        pc = pc | (pull() << 8);
        pc++;
        pc = pc & 0xffff;

Implementing the other stack operations 

Let us now implement the rest of the stack operations.

The simplest of these operations are the transfer between the Stack Pointer register and the X register, which is TSX and TXS. So let us quickly implement them:

/*
        TXS (Transfer X to Stack ptr)   $9A  2
 */
      case 0x9a:
        _sp = _x;
/*
        TSX (Transfer Stack ptr to X)   $BA  2
 */
      case 0xba:
        _x = _sp;

What remains to be implemented is pushing and pulling the status register, that is the register that contains all the flags, like the Zero Flag, Negative Flag, overflow flag and do so on.

At this point the question arises in which order the flags are stored in the status byte that gets pushed onto the stack. One possibility is deciding on the order of the flags yourself and emulation will probbaly work correctly 99% of the time.

However, as I mentioned in the previous section where we implemented the JSR/RTS instructions, you often 6502 machine language programs that inspect the contents of the stack, so if you decide the order of the flags in the status byte yourself, this code might not work correctly.

The question is: How do we find the correct order of the flags in the status register? In the general the web sites that gives you info on the 6502 instructions, don't provide you with this info on the status register.

After digging a bit on the internet, I found the information via the following link:


They provide a nice diagram for the status register:

Some extra information about the status register is that bit 4 and 5 should be one when pushed on the stack. Similarly, when popping this value back to the status register, we ignore bits 4 and 5. With all this said, let us implement the PHP and PLP instructions:

/*
        PHP (PusH Processor status)     $08  3
 */
      case 0x08:
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
/*
        PLP (PuLl Processor status)     $28  4
 */
      case 0x28:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;

We have implemented all instructions for this post. In the next section we will write a test program for all the instructions we have added.

The Test Program

We will use the following for our test program:

0000 A9 0A LDA #$0a
0002 48    PHA
0003 48    PHA
0004 48    PHA
0005 48    PHA
0006 a2 50 LDX #$50
0008 9a    TXS
0009 48    PHA
000a 48    PHA
000b 48    PHA
000c 48    PHA
000d A9 7F LDA #$7f
000f 69 01 ADC #$01
0011 20 19 00 JSR TEST
0014 68    PLA
0015 68    PLA
0016 68    PLA
0017 68    PLA
0018 68    PLA
0019 08 TEST PHP
001a B8    CLV
001b A9 00  LDA #$00
001d 28     PLP
001e 60     RTS
Here we test a couple of operations of the stack. Pushing and pulling elements from the stack, changing the stack pointer, doing a JSR/RTS and pushing and pulling the Status register.

Currently within our emulator, we only have a view of the first page of memory (e.g. bytes 0 to 255). However, when executing the above program it would be nice to extend the view so we can see what is happening on the stack as well. I have made the change and it look like this:


I am not going to cover the changes required to adjust the view like this, but it is available in a git tag I have created here. This tag also contains the test program for this post as binary which will execute as you click the step button.

Lets see how the stack changes as we execute the program. We start by pushing the Accumulator a number of times to the stack. We can see our values towards the end of page 1:

We then change the stack pointer to 0x50 and do a couple of pushes again of the Accumulator. We can now see the contents pushed is now in a different aread in memory:

Next, we force the Overflag flag to be set by doing an addition that causes an overflow after which we push the status register. With the overflow operation we just mange to set as much flags as possible. We then jump to a sub routine which pushes some stuff on the stack.

At this point, our memory dump will look like this:

The return address pushed is 0013. As mentioned in a previous section the return address pushed is always one less than the actual address, because of the design of the 6502.

The value pushed for the Status Register is F0 (e.g. the upper 4 bits set). As mentioned previously, bits 4 and 5 are always set, and because of the operations we did, the overflow flag is set as well as the negative flag. 

We then clear the negative and overflow flag on purpose to see if the PLP instruction at the end of the subroutine restore them for us.

We then correctly return from the subroutine continuing execution at address 0014. We then do a number of pulls to our accumulator to see if we get back the same values that originally pushed. By purpose I have added an extra PLA afterwards to see what it does. And as expected, we get a 00 because that it after the last value.

This concludes what we want o achieve in this post

In Summary

In this post we implemented all stack operations, including push and pull the accumulator and the Status register. We also implemented the JSR/RTS instructions, which also relies on the stack.

We are just about finished with implementing all instructions for the 6502. What remains are the following:

  • BIT
  • JMP (Jump)
  • NOP
  • Implied register operations
So, in the next post I will be implementing these.

With the above implemented, we can move onto more interesting things, like running the Klaus Dormann Test Suite on our emulator to see if it behaves like a real 6502. This is very important, because it will help us to emulate a game as accurately as possible.

Until next time! 

Monday, 10 February 2025

A Commodore 64 Emulator in Flutter: Part 7

Foreword

In the previous post we implemented the Logic operator and bit shifting instructions for our Flutter C64 emulator.

In this post we will be implementing the Compare and branching instructions.

The Compare instructions

The comparison instructions remind us of the if statement you get in almost every programming language where you test two numbers to see which one is the biggest or if they are equal.

In machine language, like on the 6502, we mimic the if statement via a compare instruction which subtract the two numbers effecting either the Carry, negative or zero flag. The then part of the if statement we mimic with a branch instruction, where you can say branch to a particular address depending on a certain state of one these flags. More on the branch instructions in another section.

One concern when mentioning the fact that the compare instruction does a subtract, is whether an overflow is a possibility as we experience with an SBC (subtract with carry). Checking the documentation and seeing that no Overflow flag is set by the compare instruction, is indeed confusing.

There is, however, two facts that make the overflow flag not relevant with a compare instruction. One fact is that a compare does an unsigned comparison. The other reason is we only consider the Carry or Zero flag when doing a comparison, and don't really look at the Negative flag.

Let us now see how we can implement the compare instructions in our flutter emulator.

One can get into temptation just to in Flutter to just do a physical subtraction when implementing a compare instruction. However, one would not be able to accurate emulate a 6502 compare instruction when you do this. Let us go into a bit more detail into why this is.

Lets take an example. If a number in the accumulator is bigger than the other number compare. The carry flag needs to set. The carry flag corresponds to bit 8 of the result, or have a weight of 256. So, suppose you compare 2 with 1, you should get the value 257, which is this value in binary:

(1) 0000 0001

If you just do a subtraction in flutter for this, you will just get 1. Strictly speaking, there will still be a carry in the background of Flutter and your CPU the operation is performed, but because the numbers have much more bits in modern day CPU's than 8 bits, like 64 bits, the carry bit will probably be at bit 65 or so.

So, let us see if we can emulate the 6502 compare instructions more accurately. To do this, we meed to understand how the 6502 does subtraction. All this boils down to Two's complement for representing negative numbers. Two's complement basically saus in order to make a number negative, you first need to negate the number (that is making every 1 a zero and every zero a one) and add one to the result.

Say for instance you want to represent -1. First, in binary you have:

0000 0001

Now do the negation:

1111 1110

And then add 1:

1111 1111

At first glance, this number doesn't look meaningful, but lets take an analogy that will shed new light on the meaning of such a number. Everyone knows about an odometer of a car. It starts at 000000 and counts till 999999. When it goes past 999999, it goes back to 000000.

Suppose we could do something interesting. With the odometer at 000000, we go back 3 units, then you are at 999997. Now, if you add 5, you get to (1)000002. The one is in brackets because the digit doesn't really exists on the odometer. But, what we have actually done here was subtracting 3 from 5 using addition! The 999997 is the ten's complement representation of -3.

We can use the same analogy in binary. Lets say you have the 8 bit binary number 0000 0000. If you move back one unit, you get 1111 1111, where everything is zero. This actually corresponds to -1, which we determined earlier.

If you move back one further unit you get 1111 1110 for -2 and 1111 1101 for -3. All this you can verify using 2's complement.

Let us now use this knowledge with a compare. Suppose you want to compare 2 and 1. So, we do 2-1 in binary, with the -1 converted to two's complement:

  0000 0010

+ 1111 1111

(1)0000 0001

We can see we have a carry indicating the first number is bigger than the second. If we swop the number around, i.e. 1-2, we get this:

0000 0001

+1111 1110

1111 1111

In this case 1111 1110 is the two's complement of -2. In this case we dont get a carry with the addition, meaning the first number is smaller.

Let us now do some coding to implement these instructions in our flutter emulator. We create the following compare method which we can use among the different flavours of compare instructions:

  void compare(int operand1, int operand2) {
    operand2 = ~operand2 & 0xff;
    operand1 = operand1 + operand2 + 1;
    _n = ((operand1 & 0x80) == 0x80) ? 1 : 0;
    _c = (operand1 & 0x100) != 0 ? 1 : 0;
    _z = ((operand1 & 0xff) == 0) ? 1 : 0;
  }
This method starts off with doing twos complement, but we and the result of the negate with 0xff, so we just sit with the lower 8 bits. Our flutter emulator will probably run on 64 bit machines, which meand if we do a negate, we will probably sit with a 64-bit number where bits 8 to 63 are ones, which will probably give us a result which we don't want.

The rest of this method is pretty straight forward. Bit 7 of the result is the negative flag, Bit 8 is the carry flag. Also we use the lower 8 bits to check if the result is zero. 

With this method implemented, we can now implement the individual compare opcodes. Lets start with the CMP instructions:

/*
CMP (CoMPare accumulator)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CMP #$44      $C9  2   2
Zero Page     CMP $44       $C5  2   3
Zero Page,X   CMP $44,X     $D5  2   4
Absolute      CMP $4400     $CD  3   4
Absolute,X    CMP $4400,X   $DD  3   4+
Absolute,Y    CMP $4400,Y   $D9  3   4+
Indirect,X    CMP ($44,X)   $C1  2   6
Indirect,Y    CMP ($44),Y   $D1  2   5+

+ add 1 cycle if page boundary crossed
 */
      case 0xC9:
        compare(_a, arg0);
      case 0xC5:
      case 0xD5:
      case 0xCD:
      case 0xDD:
      case 0xD9:
      case 0xC1:
      case 0xD1:
        compare(_a, memory.getMem(resolvedAddress));

Pretty straightforward, and we pass the value of the accumulator in, in both cases.

Lets do the same with CPX and CPY:

/*
CPX (ComPare X register)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CPX #$44      $E0  2   2
Zero Page     CPX $44       $E4  2   3
Absolute      CPX $4400     $EC  3   4
 */
      case 0xE0:
        compare(_x, arg0);
      case 0xE4:
      case 0xEC:
        compare(_x, memory.getMem(resolvedAddress));

/*
CPY (ComPare Y register)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CPY #$44      $C0  2   2
Zero Page     CPY $44       $C4  2   3
Absolute      CPY $4400     $CC  3   4
 */
      case 0xC0:
        compare(_y, arg0);
      case 0xC4:
      case 0xCC:
        compare(_y, memory.getMem(resolvedAddress));

So, we pass in the value of register x for CPX instructions and the value of register y for CPY instructions.

This conclude all the compare instructions.

Branching Instructions

Let us now look at the branching instructions. Every branching instruction branch depending on the state of a certain flag, whether it is the Carry flag, Zero Flag, Negative flag and so on.

With a branch instruction we don't supply an absolute address to jump to if the branch condition is true, but a relative address that you need to add to the program register to find the destination address.

Let us write a quick 6502 machine code program to understand the branch instructions better:

4000 A9 05 LDA #$5
4002 38    SEC
4003 E9 01 SBC #$1
4005 D0 FC BNE $FC
4007 A9 22 LDA #$22
Here we have a program where we basically have a loop where the Accumulator starts with value 5, and gets decremented till it reaches zero.

The controller of the loop is at address 4005, the BNE (Branch if not equal) instruction, that will keep branching back to the SBC instruction at address 4003 until the zero flag is set.

Now the paramter of the BNE instruction might look confusing, but in actual fact, it is a 8 bit two's complement number you need to add to the program counter if the branch is to be taken. This means you can jump in the range -128...127.

In our example, the parameter $FC is the two's complement for -4. Now when we want to execute the BNE, program counter is just after this instruction, which in this case is 4007. Subtract 4 from this, and you are at address 4003, which is where we want to be.

With all this said, let us see if we can emulate the branch instruction in Flutter. All in all this boils down to adding a byte value to a 16-bit value, with a twist: The byte value is signed! This is a bit tricky to emulate on most platforms, because if you add a byte value to a 16-bit value, the value will always go up, and not down if it is negative.

Lets look at a couple of ways to solve this. Off the bat, one would probably do something like this in Flutter:

    if (operand1 > 127) {
      operand1 = operand1 | 0xff00;
    }
    return (pc + operand1) & 0xffff;
So, if we see our offset is negative, e.g. our byte value is bigger than 127, we pad bits 8-15 with ones. If we then add this to the program counter, the lower 16 bits of the result would indeed be the result of a subtraction.

This is indeed a solution, but can't we make it more elegant? Lets look a bit what the famous C64 emulator, VICE do with this:


So, in VICE, if the branch is to be taken it does something very fancy when calculating the destination address. It casts the byte to a signed char! This is a very nifty trick which the C language provides. By casting a byte as a signed value, the C compiler honours the fact that this byte is a twos complement value, and thus if the value of the byte is negative, it will do the subtraction for you.

However, that nifty trick is in C and not in Flutter in which we develop this emulator. So, the question is: Is there a similar nifty trick we can use in Flutter for this? Indeed there is. In Flutter for every int, there is a toSigned() method you can call. As parameter, you pass it the number of bits your number is wide, assuming the last most significant bit is the sign bit. So, if you do something like the following:

0xfe.toSigned(8)
You will get back -2. 

We now have enough info to calculate an address for the relative address mode in the method calculateEffectiveAddress:

 int calculateEffectiveAddress(int mode, int operand1, int operand2) {
    var modeAsEnum = AddressMode.values[mode];
    switch (modeAsEnum) {
...
    case AddressMode.relative:
        return (pc + operand1.toSigned(8)) & 0xffff;
    }
...
    return 0;
  }
Next, we implement the following method:

 branchConditional(bool doBranch, branchAddress) {
    if (doBranch) {
      pc = branchAddress;
    }
  }
And finally, we can implement all the branch instructions:

    /*
    BPL (Branch on PLus)           $10
     */
      case 0x10:
        branchConditional(_n == 0, resolvedAddress);
    /*
    BMI (Branch on MInus)          $30
     */
      case 0x30:
        branchConditional(_n == 1, resolvedAddress);
    /*
    BVC (Branch on oVerflow Clear) $50
     */
      case 0x50:
        branchConditional(_v == 0, resolvedAddress);
    /*
    BVS (Branch on oVerflow Set)   $70
     */
      case 0x70:
        branchConditional(_v == 1, resolvedAddress);

    /*
    BCC (Branch on Carry Clear)    $90
     */
      case 0x90:
        branchConditional(_c == 0, resolvedAddress);
    /*
    BCS (Branch on Carry Set)      $B0
     */
      case 0xB0:
        branchConditional(_c == 1, resolvedAddress);
    /*
    BNE (Branch on Not Equal)      $D0
     */
      case 0xD0:
        branchConditional(_z == 0, resolvedAddress);
    /*
    BEQ (Branch on EQual)          $F0
     */
      case 0xF0:
        branchConditional(_z == 1, resolvedAddress);

A Test program

Let us end this post where we a write a quick test program doing a compare and branch.

At this point of writing the test program, it really start to be become convenient to have the DEX and DEY commands, which we don't have implemented at the moment. So, I took the liberty to implement them in the emulator. I will not show the implementation here, but you are welcome to look at that on my github page.

So, here is the test program:

4000 A2 0A LDX #$0A
4002 CA    DEX
4003 E0 04 CPX #$04
4005 D0 FB BNE LOOP
4007 A9 15 LDA #$15
This program will loop with values from $a to $4 in register X.

You can find this program as binary as well as the state of our emulator as per this post, via this tag:


In summary

In this post we implemented the brnach and compare instructions.

In the next post we will be implementing stack operations in our emulator.

Until next time!

Saturday, 11 January 2025

A Commodore 64 Emulator in Flutter: Part 6

Foreword

In the previous post I have implemented arithmetic instructions and flag modification instructions.

In this post I will be implementing Bit logic instruction and bit shifting instructions.

The source code for this post can be found on Github, with the following tag:

https://github.com/ovalcode/c64_flutter/releases/tag/c64_flutter_part_6

Enjoy!

Logic Operators

Let us start with the logic operators:

/*
AND (bitwise AND with accumulator)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     AND #$44      $29  2   2
Zero Page     AND $44       $25  2   3
Zero Page,X   AND $44,X     $35  2   4
Absolute      AND $4400     $2D  3   4
Absolute,X    AND $4400,X   $3D  3   4+
Absolute,Y    AND $4400,Y   $39  3   4+
Indirect,X    AND ($44,X)   $21  2   6
Indirect,Y    AND ($44),Y   $31  2   5+

+ add 1 cycle if page boundary crossed

 */
      case 0x29:
        _a = _a & arg0;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x25:
      case 0x35:
      case 0x2D:
      case 0x3D:
      case 0x39:
      case 0x21:
      case 0x31:
        _a = _a & memory.getMem(resolvedAddress);
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

              /*
    EOR (bitwise Exclusive OR)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     EOR #$44      $49  2   2
Zero Page     EOR $44       $45  2   3
Zero Page,X   EOR $44,X     $55  2   4
Absolute      EOR $4400     $4D  3   4
Absolute,X    EOR $4400,X   $5D  3   4+
Absolute,Y    EOR $4400,Y   $59  3   4+
Indirect,X    EOR ($44,X)   $41  2   6
Indirect,Y    EOR ($44),Y   $51  2   5+

+ add 1 cycle if page boundary crossed

     */
      case 0x49:
        _a = _a ^ arg0;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x45:
      case 0x55:
      case 0x4D:
      case 0x5D:
      case 0x59:
      case 0x41:
      case 0x51:
        _a = _a ^ memory.getMem(resolvedAddress);
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

/*
ORA (bitwise OR with Accumulator)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     ORA #$44      $09  2   2
Zero Page     ORA $44       $05  2   3
Zero Page,X   ORA $44,X     $15  2   4
Absolute      ORA $4400     $0D  3   4
Absolute,X    ORA $4400,X   $1D  3   4+
Absolute,Y    ORA $4400,Y   $19  3   4+
Indirect,X    ORA ($44,X)   $01  2   6
Indirect,Y    ORA ($44),Y   $11  2   5+

+ add 1 cycle if page boundary crossed

 */
      case 0x09:
        _a = _a | arg0;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x05:
      case 0x15:
      case 0x0D:
      case 0x1D:
      case 0x19:
      case 0x01:
      case 0x11:
        _a = _a | memory.getMem(resolvedAddress);
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

Everything is straightforward here, and nothing to comment on.

Shifting operators

Next let us implement all the bit shifting operators:

      /*
ASL (Arithmetic Shift Left)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Accumulator   ASL A         $0A  1   2
Zero Page     ASL $44       $06  2   5
Zero Page,X   ASL $44,X     $16  2   6
Absolute      ASL $4400     $0E  3   6
Absolute,X    ASL $4400,X   $1E  3   7

ASL shifts all bits left one position. 0 is shifted into bit 0 and the original bit 7 is shifted into the Carry.

 */
      case 0x0A:
        _a = _a << 1;
        _c = ((_a & 0x100) != 0) ? 1 : 0;
        _a = _a & 0xff;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x06:
      case 0x16:
      case 0x0E:
      case 0x1E:
        int temp = memory.getMem(resolvedAddress) << 1;
        _c = ((temp & 0x100) != 0) ? 1 : 0;
        temp = temp & 0xff;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);

      /*
LSR (Logical Shift Right)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Accumulator   LSR A         $4A  1   2
Zero Page     LSR $44       $46  2   5
Zero Page,X   LSR $44,X     $56  2   6
Absolute      LSR $4400     $4E  3   6
Absolute,X    LSR $4400,X   $5E  3   7

LSR shifts all bits right one position. 0 is shifted into bit 7 and the original bit 0 is shifted into the Carry.

 */
      case 0x4A:
        _c = _a & 1;
        _a = _a >> 1;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x46:
      case 0x56:
      case 0x4E:
      case 0x5E:
        int temp = memory.getMem(resolvedAddress);
        _c = temp & 1;
        temp = temp >> 1;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);

/*
ROL (ROtate Left)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Accumulator   ROL A         $2A  1   2
Zero Page     ROL $44       $26  2   5
Zero Page,X   ROL $44,X     $36  2   6
Absolute      ROL $4400     $2E  3   6
Absolute,X    ROL $4400,X   $3E  3   7

ROL shifts all bits left one position. The Carry is shifted into bit 0 and the original bit 7 is shifted into the Carry.


 */
      case 0x2A:
        _a = (_a << 1) | _c;
        _c = ((_a & 0x100) != 0) ? 1 : 0;
        _a = _a & 0xff;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x26:
      case 0x36:
      case 0x2E:
      case 0x3E:
        int temp = (memory.getMem(resolvedAddress) << 1) | _c;
        _c = ((temp & 0x100) != 0) ? 1 : 0;
        temp = temp & 0xff;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);
      /*
    ROR (ROtate Right)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Accumulator   ROR A         $6A  1   2
Zero Page     ROR $44       $66  2   5
Zero Page,X   ROR $44,X     $76  2   6
Absolute      ROR $4400     $6E  3   6
Absolute,X    ROR $4400,X   $7E  3   7

ROR shifts all bits right one position. The Carry is shifted into bit 7 and the original bit 0 is shifted into the Carry.


     */
      case 0x6A:
        _a = _a | (_c << 8);
        _c = _a & 1;
        _a = _a >> 1;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;
      case 0x66:
      case 0x76:
      case 0x6E:
      case 0x7E:
        int temp = memory.getMem(resolvedAddress) | (_c << 8);
        _c = temp & 1;
        temp = temp >> 1;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);

As you can see, with the shifting instructions, either the contents of the accumulator is shifted, or the contents of a memory location is shifted. Obviously, when shifting the contents of a memory location, more steps are involved.

The Test Program

Here is a quick test program for testing the instructions we have implemented in this post:

/*asl C <- [76543210] <- 0
  lsr 0 -> [76543210] -> C
  rol C <- [76543210] <- C
  ror C -> [76543210] -> C*/

LDA #01   A9 01
ORA #02   09 02
EOR #06   49 06
LSR A     4A
ROL A     2A
SEC       38
AND #$fe  29 fe
ROR A     6a
LDA #3    a9 03
STA $60   85 60
CLC       18
LSR $60   46 60
ROL $60   26 60
ROR $60   66 60

I have added a comment on the top as a quick reference to how the different shifting instructions work, just making it easier to follow the program.

I have included this program in the Github tag for this post.

In Summary

In this post we implemented the Logic operator and Bit shifting instructions in our emulator.

In the next post we will be implementing the Compare and Branching instructions.

Until next time!

Wednesday, 1 January 2025

A Commodore 64 Emulator in Flutter: Part 5

Foreword

In the previous post we implemented a generic way for resolving addresses by address mode and implemented all load and store instructions.

In this post we will implement all the arithmetic instructions, which includes Add with Carry (ADC), Subtract with Carry (SBC), and you increase and decrease instructions.

As I have started doing in the previous post, I am making the code available in each post as a tag in a Github repo. Here is the tag for this post:

https://github.com/ovalcode/c64_flutter/tree/c64_flutter_part_5

About Flags

In the previous post, I started to implement some of the flags. In that post, I also thought it was really cool that the Dart language used in Flutter has a boolean and decided to implemented flags using this type.

However, this joy about booleans was short lived. I discovered that in many cases operations is done on the Carry flag, where it is added to a number (e.g. ADC and SBC), or it is shifted into a number. The dart language, however, doesn't allow you to use booleans in this way where you mix it with integer operations.

So, I have abandon my idea of using booleans for flags, and rather use ints for Flags.

On the subject of flags. In this post I have also implemented all Flags. I am not going to show the implementation here, but you are welcome to have a look at that on my tag for this post on Github.

I have mentioned that in this post I will be implementing all arithmetic operations. However, when I wanted to these implemented instructions, I found that it is unavoidable when you test ADC and SBC to be explicitly be able to set and clear the carry flags for these operations. For this you need to SEC and CLC instructions.

So, I found it fit to also implement all Flag modification instructions in this post as well. Here is the implementation:

/*
These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                       HEX
CLC (CLear Carry)              $18
SEC (SEt Carry)                $38
CLI (CLear Interrupt)          $58
SEI (SEt Interrupt)            $78
CLV (CLear oVerflow)           $B8
CLD (CLear Decimal)            $D8
SED (SEt Decimal)              $F8

 */
      case 0x18:
        _c = 0;
      case 0x38:
        _c = 1;
      case 0x58:
        _i = 0;
      case 0x78:
        _i = 1;
      case 0xB8:
        _v = 0;
      case 0xD8:
        _d = 0;
      case 0xF8:
        _d = 1;

The comments I retrieved again from 6502.org. This is one of those set of instructions my automated CPU table creator in awk couldn't transcribe, so it is important to also adjust the Cpu tables to include the instruction length and number of cycles for these instructions. I have performed these table adjustments in my Github tag.

Implementing ADC and SBC

We start to implement the ADC and SBC instructions, by first implementation the following 2 methods:

  void adc(int operand) {
    int temp = _a + operand + _c;
    _v = (((_a ^ temp) & (operand ^ temp) & 0x80) != 0) ? 1 : 0;
    _a = temp & 0xff;
    //N V Z C
    _n = ((_a & 0x80) == 0x80) ? 1 : 0;
    _z = (_a == 0) ? 1 : 0;
    _c = (temp & 0x100) != 0 ? 1 : 0;
  }

  void sbc(int operand) {
    operand = ~operand & 0xff;
    int temp = _a + operand + _c;
    _v = (((_a ^ temp) & (operand ^ temp) & 0x80) != 0) ? 1 : 0;
    _a = temp & 0xff;
    //N V Z C
    _n = ((_a & 0x80) == 0x80) ? 1 : 0;
    _z = (_a == 0) ? 1 : 0;
    _c = (temp & 0x100) != 0 ? 1 : 0;
  }
This is two methods doing the ADC and SBC operation and settings the applicable flags. These two methods are pretty straightforward. The only thing that might be a bit mind boggling is the setting of the overflow flag. Actually, in general the whole operation of the overflow flag is often misunderstood, as explained here, on 6502.org.

For our purposes, it is suffice to say that the overflow flag indicates during an ADC or SBC operation that sign of the result is incorrect.

Finally, the opcodes for ADC and SBC is implemented as follows:

/*SBC (SuBtract with Carry)
Affects Flags: N V Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     SBC #$44      $E9  2   2
Zero Page     SBC $44       $E5  2   3
Zero Page,X   SBC $44,X     $F5  2   4
Absolute      SBC $4400     $ED  3   4
Absolute,X    SBC $4400,X   $FD  3   4+
Absolute,Y    SBC $4400,Y   $F9  3   4+
Indirect,X    SBC ($44,X)   $E1  2   6
Indirect,Y    SBC ($44),Y   $F1  2   5+

+ add 1 cycle if page boundary crossed

SBC results are dependant on the setting of the decimal flag. In decimal mode, subtraction is carried out on the assumption that the values involved are packed BCD (Binary Coded Decimal).
There is no way to subtract without the carry which works as an inverse borrow. i.e, to subtract you set the carry before the operation. If the carry is cleared by the operation, it indicates a borrow occurred.
*/
      case 0xE9:
        sbc(arg0);
      case 0xE5:
      case 0xF5:
      case 0xED:
      case 0xFD:
      case 0xF9:
      case 0xE1:
      case 0xF1:
        sbc(memory.getMem(resolvedAddress));

        /*ADC (ADd with Carry)
Affects Flags: N V Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     ADC #$44      $69  2   2
Zero Page     ADC $44       $65  2   3
Zero Page,X   ADC $44,X     $75  2   4
Absolute      ADC $4400     $6D  3   4
Absolute,X    ADC $4400,X   $7D  3   4+
Absolute,Y    ADC $4400,Y   $79  3   4+
Indirect,X    ADC ($44,X)   $61  2   6
Indirect,Y    ADC ($44),Y   $71  2   5+

+ add 1 cycle if page boundary crossed

ADC results are dependant on the setting of the decimal flag. In decimal mode, addition is carried out on the assumption that the values involved are packed BCD (Binary Coded Decimal).
There is no way to add without carry.
*/

      case 0x69:
        adc(arg0);
      case 0x65:
      case 0x75:
      case 0x6D:
      case 0x7D:
      case 0x79:
      case 0x61:
      case 0x71:
        adc(memory.getMem(resolvedAddress));

Inc and Dec

The next instructions to implement is Inc and Dec:

/*DEC (DECrement memory)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Zero Page     DEC $44       $C6  2   5
Zero Page,X   DEC $44,X     $D6  2   6
Absolute      DEC $4400     $CE  3   6
Absolute,X    DEC $4400,X   $DE  3   7
*/
      case 0xC6:
      case 0xD6:
      case 0xCE:
      case 0xDE:
        int temp = memory.getMem(resolvedAddress) - 1;
        temp = temp & 0xff;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);
/*INC (INCrement memory)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Zero Page     INC $44       $E6  2   5
Zero Page,X   INC $44,X     $F6  2   6
Absolute      INC $4400     $EE  3   6
Absolute,X    INC $4400,X   $FE  3   7*/
      case 0xE6:
      case 0xF6:
      case 0xEE:
      case 0xFE:
        int temp = memory.getMem(resolvedAddress) + 1;
        temp = temp & 0xff;
        _n = ((temp & 0x80) != 0) ? 1 : 0;
        _z = (temp == 0) ? 1 : 0;
        memory.setMem(temp, resolvedAddress);

Nothing much to be said about these instructions. The contents of a memory location is being incremented or decremented. Only the flags N and Z are effected, and not the Carry or overflow flag as we did with ADC and SBC.

The alert reader might with spot that I didn't implement the register Increment/Decrement commands, like INX, DEX, INY and DEY. I will deal with these in a future post when dealing with other register commands.

The Test Program

To test all the instructions I have implemented in this post, I have written the following Test Program:

SEC       38
LDA #$9   A9 09
SBC #$3   E9 03
SBC #$7   E9 07
ADC #$80  69 80
STA $0020 8D 20 00
INC $20   E6 20
LDA $20   A5 20
CLV       B8
SBC #$01  E9 01
I do a couple of subtracts, which I start by setting the carry flag, so that two's compliment works correctly.

So, I start with loading the Accmulator with 9, and subtracting 3, which yields 6.

I then subtracts 7, which yields -1. I then add $80, which is the two's complement in 8 bits for -128. So, -1 plus -128 is -129, which is outside the signed range of an 8-bit number. This condition should set the Overflow (V) flag, which does when I execute past this instruction.

I then store the accumulator to memory to test if the INC memory instruction functions correctly.

I then load the incremented value back to the accumulator, which should be $80 at this stage. I then do another test that should trigger an overflow when doing an SBC.

In Summary

In this post I have implemented the Flag instructions, ADC, SBC and Memory Increment/Decrement instructions.

In the next post I will be implementing bit logic and bit shifting instructions.

Until next time!

Sunday, 29 December 2024

A Commodore 64 Emulator in Flutter: Part 4

Foreword

In the previous post we added some basic plumbing to our emulator for single stepping through instructions and showing a dump of memory and registers at each step.

We ended off the post by implementing the instructions Load and store Accumulator (LDA) immediate and Store Accumulator (STA) absolute.

In this post we will go forth and implement every single load and store instruction, with every associated address mode.

In this exercise we will also be developing a generic way of resolving addresses in the different address modes, not having to do it with every single instruction.

Hope you enjoy this post!

Lookup tables

I mentioned that I want to create a generic way for dealing with address modes. Considering that there is over 100 instructions on the 6502 CPU, this sounds like quite an intimidating task!

But fear not. We can use lookup tables to lookup the addressing mode for each opcode. 😀

However, despite using a lookup table, there is still the daunting task of creating table by hand and is very error prone, considering the volume of instructions.

I will try and make this task less daunting by automating this table generation, and supplying this process a text file of all the instructions. The following website from 6502.org gives it to us:

http://www.6502.org/tutorials/6502opcodes.html

Lets have a look at how this info of the instructions is laid out:

ADC (ADd with Carry)
Affects Flags: N V Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     ADC #$44      $69  2   2
Zero Page     ADC $44       $65  2   3
Zero Page,X   ADC $44,X     $75  2   4
Absolute      ADC $4400     $6D  3   4
Absolute,X    ADC $4400,X   $7D  3   4+
Absolute,Y    ADC $4400,Y   $79  3   4+
Indirect,X    ADC ($44,X)   $61  2   6
Indirect,Y    ADC ($44),Y   $71  2   5+

+ add 1 cycle if page boundary crossed

ADC results are dependant on the setting of the decimal flag. In decimal mode, addition is carried out on the assumption that the values involved are packed BCD (Binary Coded Decimal).
There is no way to add without carry.



AND (bitwise AND with accumulator)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     AND #$44      $29  2   2
Zero Page     AND $44       $25  2   3
Zero Page,X   AND $44,X     $35  2   4
Absolute      AND $4400     $2D  3   4
Absolute,X    AND $4400,X   $3D  3   4+
Absolute,Y    AND $4400,Y   $39  3   4+
Indirect,X    AND ($44,X)   $21  2   6
Indirect,Y    AND ($44),Y   $31  2   5+

+ add 1 cycle if page boundary crossed

We see the actual info we need is in table format, which is nice. We can easily extract the info we need from that.

What would complicate the process is preceding text for each table, which we need to remove. A mindset we can apply for that, would be to look for the word MODE in the beginning of the line, then we can assume the following lines are instruction data, until we hit a blank line.

The next question is, what language we use for this task? A couple of years ago I used Java for this task, but revisiting this task, I feel like using something that is easily accessible from the Linux command line, like sed and awk.

After playing around a bit, I got to this command for leaving only the tables:

sed -n '/^MODE/{:a;N;/\n$/!ba;s/\n//gp}' lodandstore.txt
Running this, we get the following text output:

MODE           SYNTAX       HEX LEN TIM
Immediate     ADC #$44      $69  2   2
Zero Page     ADC $44       $65  2   3
Zero Page,X   ADC $44,X     $75  2   4
Absolute      ADC $4400     $6D  3   4
Absolute,X    ADC $4400,X   $7D  3   4+
Absolute,Y    ADC $4400,Y   $79  3   4+
Indirect,X    ADC ($44,X)   $61  2   6
Indirect,Y    ADC ($44),Y   $71  2   5+

MODE           SYNTAX       HEX LEN TIM
Immediate     AND #$44      $29  2   2
Zero Page     AND $44       $25  2   3
Zero Page,X   AND $44,X     $35  2   4
Absolute      AND $4400     $2D  3   4
Absolute,X    AND $4400,X   $3D  3   4+
Absolute,Y    AND $4400,Y   $39  3   4+
Indirect,X    AND ($44,X)   $21  2   6
Indirect,Y    AND ($44),Y   $31  2   5+

MODE           SYNTAX       HEX LEN TIM
Accumulator   ASL A         $0A  1   2
Zero Page     ASL $44       $06  2   5
Zero Page,X   ASL $44,X     $16  2   6
Absolute      ASL $4400     $0E  3   6
Absolute,X    ASL $4400,X   $1E  3   7

This resembles more what I am looking for. Lets pipe this output to another sed command, so we are only left with instruction lines:
sed -n '/^MODE/{:a;N;/\n$/!ba;s/\n/\n/g;p}' lodandstore.txt | sed '/^MODE/d; /^\s*$/d'
In the second sed command the /d basically tells sed if you find that match delete that line.

The output of this command is as follows:

Immediate     ADC #$44      $69  2   2
Zero Page     ADC $44       $65  2   3
Zero Page,X   ADC $44,X     $75  2   4
Absolute      ADC $4400     $6D  3   4
Absolute,X    ADC $4400,X   $7D  3   4+
Absolute,Y    ADC $4400,Y   $79  3   4+
Indirect,X    ADC ($44,X)   $61  2   6
Indirect,Y    ADC ($44),Y   $71  2   5+
Immediate     AND #$44      $29  2   2
Zero Page     AND $44       $25  2   3
Zero Page,X   AND $44,X     $35  2   4
Absolute      AND $4400     $2D  3   4
Absolute,X    AND $4400,X   $3D  3   4+
Absolute,Y    AND $4400,Y   $39  3   4+
Indirect,X    AND ($44,X)   $21  2   6
Indirect,Y    AND ($44),Y   $31  2   5+
Accumulator   ASL A         $0A  1   2
Zero Page     ASL $44       $06  2   5
Zero Page,X   ASL $44,X     $16  2   6
Absolute      ASL $4400     $0E  3   6
Absolute,X    ASL $4400,X   $1E  3   7
So, now we have a text file only containing the instruction data. Now we just need to extract the required data from each line and build the table.

The address modes array is quite a complex one to start with, so let us start with something simpler, the instruction length and cycle array. Both these lookup tables we will need eventually so we can just as well tackle them while we are at it.

Using awk, here is the code to generate the Instruction Length array:

awk '
BEGIN {for (i = 0; i < 256; i++) instruction_lengths[i] = 0;}
{
    mode = substr($0, 1, 14)
    format = substr($0, 15, 14)
    opcode = substr($0, 30, 3)
    hex_index = strtonum("0x" opcode)
    insLen = substr($0, 34, 1)
    gsub(/[ ]+$/, "", mode)  # Remove trailing spaces from the substring
    gsub(/[ ]+$/, "", format)  # Remove trailing spaces from the substring

    hex_index = strtonum("0x" opcode)
    instruction_lengths[hex_index] = insLen
}
END {
  # Print the array, with values separated by commas
  for (i = 0; i < length(instruction_lengths); i++) {
    printf "%s, ", instruction_lengths[i]
    if ((i % 16) == 15)
       print "";
  }
  print "}"
}
' processed.txt
Please note that processed.txt is the path to our text file we created previously containing only the instruction rows.

Our code contains three blocks. We start with a BEGIN block where we initialise the resulting array with zeroes.

We then have a middle block which awk invokes for each row, and thus $0 always contains the text of the row we are currently busy with.

In this block we carefully extract the mode, format, opcode and instruction lengths into variables of their own. Opcode will eventually be used as an index into our resulting array for placing extracted instruction length.

We then have an END block where we print the contents of the array that we can use as an array definition in our code. The resulting array looks like this:

1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 0, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
3, 2, 0, 0, 2, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 0, 0, 
2, 2, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 2, 2, 2, 0, 0, 3, 0, 0, 3, 3, 3, 0, 
2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 0, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0, 
0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0, 
So, now we have an array, where if we have an opcode, we can quickly find the length of it.

Next let us write similar awk code to get an array of cycle lengths. This array will be important in the future to determine how many clock cycles our emulator actually consumed, and we can add some appropriate delays so that our emulator runs at the same speed as a real C64.

Here is the code:

awk '
BEGIN {for (i = 0; i < 256; i++) instruction_cycles[i] = 0;}
{
    mode = substr($0, 1, 14)
    format = substr($0, 15, 14)
    opcode = substr($0, 30, 3)
    hex_index = strtonum("0x" opcode)
    insLen = substr($0, 34, 1)
    insCycles = substr($0, 38, 1)
    gsub(/[ ]+$/, "", mode)  # Remove trailing spaces from the substring
    gsub(/[ ]+$/, "", format)  # Remove trailing spaces from the substring

    hex_index = strtonum("0x" opcode)
    instruction_cycles[hex_index] = insCycles
}
END {
  # Print the array, with values separated by commas
  for (i = 0; i < length(instruction_cycles); i++) {
    printf "%s, ", instruction_cycles[i]
    if ((i % 16) == 15)
       print "";
  }
  print "}"
}
' processed.txt

The resulting array look like this:

7, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 0, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
6, 6, 0, 0, 3, 3, 5, 0, 0, 2, 2, 0, 4, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
6, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 3, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
6, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 5, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
0, 6, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 4, 4, 0, 
0, 6, 0, 0, 0, 4, 4, 0, 0, 5, 0, 0, 0, 5, 0, 0, 
2, 6, 2, 0, 3, 3, 3, 0, 0, 2, 0, 0, 4, 4, 4, 0, 
0, 5, 0, 0, 4, 4, 4, 0, 0, 4, 0, 0, 4, 4, 4, 0, 
2, 6, 0, 0, 3, 3, 5, 0, 0, 2, 0, 0, 4, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
2, 6, 0, 0, 3, 3, 5, 0, 0, 2, 2, 0, 4, 4, 6, 0, 
0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0, 
Finally, let us get to the address mode array. Here is the code:

awk '
BEGIN {for (i = 0; i < 256; i++) addr_modes[i] = 0;}
{
    mode = substr($0, 1, 14)
    format = substr($0, 15, 14)
    opcode = substr($0, 30, 3)
    hex_index = strtonum("0x" opcode)
    insLen = substr($0, 34, 1)
    gsub(/[ ]+$/, "", mode)  # Remove trailing spaces from the substring
    gsub(/[ ]+$/, "", format)  # Remove trailing spaces from the substring

    hex_index = strtonum("0x" opcode)
    if (mode == "Implied") {
      addr_mode = 0;
    } else if (mode == "Accumulator") {
      addr_mode = 1;
    } else if (mode == "Immediate") {
      addr_mode = 2;
    } else if (node == "Zero Page") {
      addr_mode = 3;
    } else if (mode == "Zero Page,X") {
      addr_mode = 4;
    } else if (mode == "Zero Page,Y") {
      addr_mode = 5;
    } else if (mode == "Absolute,X") {
      addr_mode = 7;
    } else if (mode == "Absolute,X") {
      addr_mode = 8;
    } else if (mode == "Absolute,Y") {
      addr_mode = 9;
    } else if (mode == "Indirect") {
      addr_mode = 10;
    } else if (mode == "Indirect,X") {
      addr_mode = 11;
    } else if (mode == "Indirect,Y") {
      addr_mode = 12;
    }
    addr_modes[hex_index] = addr_mode
}
END {
  # Print the array, with values separated by commas
  for (i = 0; i < length(addr_modes); i++) {
    printf "%s, ", addr_modes[i]
    if ((i % 16) == 15)
       print "";
  }
  print "}"
}
' processed.txt
And the resulting array looks like this:

0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
10, 11, 0, 0, 7, 2, 1, 0, 0, 2, 1, 0, 7, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 7, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 10, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
0, 11, 0, 0, 0, 12, 12, 0, 0, 0, 0, 0, 0, 4, 5, 0, 
0, 12, 0, 0, 0, 4, 5, 0, 0, 9, 0, 0, 0, 7, 0, 0, 
2, 11, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 4, 4, 5, 0, 
0, 12, 0, 0, 4, 4, 5, 0, 0, 9, 0, 0, 7, 7, 9, 0, 
2, 11, 0, 0, 2, 2, 2, 0, 0, 2, 0, 0, 2, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
2, 11, 0, 0, 2, 2, 12, 0, 0, 2, 0, 0, 2, 4, 4, 0, 
0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0, 
At this point, many people will wonder why I don't use enums for the address mode array. I probably could, but the the names of the address modes is quite lengthy, so you will end up with very long lines for you array definition, and you need to scroll back and forth horizontally which is an unpleasant experience.

One thing I want to point out, is that with my automation exercise, there was a couple of instructions we didn't cover, because in the documentation they don't follow the same format we use. Here is some examples from the documentation:

...
Branches are dependant on the status of the flag bits when the op code is encountered. A branch not taken requires two machine cycles. Add one if the branch is taken and add one more if the branch crosses a page boundary.

MNEMONIC                       HEX
BPL (Branch on PLus)           $10
BMI (Branch on MInus)          $30
BVC (Branch on oVerflow Clear) $50
BVS (Branch on oVerflow Set)   $70
BCC (Branch on Carry Clear)    $90
BCS (Branch on Carry Set)      $B0
BNE (Branch on Not Equal)      $D0
BEQ (Branch on EQual)          $F0
...
These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                       HEX
CLC (CLear Carry)              $18
SEC (SEt Carry)                $38
CLI (CLear Interrupt)          $58
SEI (SEt Interrupt)            $78
CLV (CLear oVerflow)           $B8
CLD (CLear Decimal)            $D8
SED (SEt Decimal)              $F8
...
These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                 HEX
TAX (Transfer A to X)    $AA
TXA (Transfer X to A)    $8A
DEX (DEcrement X)        $CA
INX (INcrement X)        $E8
TAY (Transfer A to Y)    $A8
TYA (Transfer Y to A)    $98
DEY (DEcrement Y)        $88
INY (INcrement Y)        $C8
...
For these instructions, we need to manually adjust the lookup tables. I will only do these once I get to the relevant sections.

With all these tables created we can place them in a file in our flutter project, called cpu_tables.dart:

class CpuTables {
  static const List<int> addressModes = [
    0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
    10, 11, 0, 0, 7, 2, 1, 0, 0, 2, 1, 0, 7, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
    0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 7, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
    0, 11, 0, 0, 0, 2, 1, 0, 0, 2, 1, 0, 10, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
    0, 11, 0, 0, 0, 12, 12, 0, 0, 0, 0, 0, 0, 4, 5, 0,
    0, 12, 0, 0, 0, 4, 5, 0, 0, 9, 0, 0, 0, 7, 0, 0,
    2, 11, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 4, 4, 5, 0,
    0, 12, 0, 0, 4, 4, 5, 0, 0, 9, 0, 0, 7, 7, 9, 0,
    2, 11, 0, 0, 2, 2, 2, 0, 0, 2, 0, 0, 2, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
    2, 11, 0, 0, 2, 2, 12, 0, 0, 2, 0, 0, 2, 4, 4, 0,
    0, 12, 0, 0, 0, 4, 4, 0, 0, 9, 0, 0, 0, 7, 7, 0,
  ];

  static const List<int> instructionLen = [
    1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 0, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
    3, 2, 0, 0, 2, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
    1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
    1, 2, 0, 0, 0, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 0, 0,
    2, 2, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 2, 2, 2, 0, 0, 3, 0, 0, 3, 3, 3, 0,
    2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 0, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
    2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 1, 0, 3, 3, 3, 0,
    0, 2, 0, 0, 0, 2, 2, 0, 0, 3, 0, 0, 0, 3, 3, 0,
  ];

  static const List<int> instructionCycles = [
    7, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 0, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0,
    6, 6, 0, 0, 3, 3, 5, 0, 0, 2, 2, 0, 4, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0,
    6, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 3, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0,
    6, 6, 0, 0, 0, 3, 5, 0, 0, 2, 2, 0, 5, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0,
    0, 6, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 4, 4, 0,
    0, 6, 0, 0, 0, 4, 4, 0, 0, 5, 0, 0, 0, 5, 0, 0,
    2, 6, 2, 0, 3, 3, 3, 0, 0, 2, 0, 0, 4, 4, 4, 0,
    0, 5, 0, 0, 4, 4, 4, 0, 0, 4, 0, 0, 4, 4, 4, 0,
    2, 6, 0, 0, 3, 3, 5, 0, 0, 2, 0, 0, 4, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0,
    2, 6, 0, 0, 3, 3, 5, 0, 0, 2, 2, 0, 4, 4, 6, 0,
    0, 5, 0, 0, 0, 4, 6, 0, 0, 4, 0, 0, 0, 4, 7, 0
  ];
}

Implementing Address resolution

With the lookup tables defined, let us add some logic to our emulator for implementation address resolution by address mode.

First, in our Cpu class, we create an enum for the address modes:

enum AddressMode {
  implied,
  accumulator,
  immediate,
  zeroPage,
  zeroPageX,
  zeroPageY,
  relative,
  absolute,
  absoluteX,
  absoluteY,
  indirect,
  indexedIndirect,
  indirectIndexed
}

Something to note here is that we use the same ordinal order here as which is used in addressModes array defined in the previous section.

Next, we define the following method in our Cpu class as well, for resolving an address by address mode:

  int calculateEffectiveAddress(int mode, int operand1, int operand2) {
    var modeAsEnum = AddressMode.values[mode];
    switch (modeAsEnum) {
      case AddressMode.zeroPage:
        return operand1;
      case AddressMode.implied:
      // TODO: Handle this case.
      case AddressMode.accumulator:
      // TODO: Handle this case.
      case AddressMode.immediate:
      // TODO: Handle this case.
      case AddressMode.zeroPageX:
        return (operand1 + _x) & 0xff;
      case AddressMode.zeroPageY:
        return (operand1 + _y) & 0xff;
      case AddressMode.relative:
      // TODO: Handle this case.
      case AddressMode.absolute:
        return (operand2 << 8) | operand1;
      case AddressMode.absoluteX:
        var add = (operand2 << 8) | operand1;
        return (add + _x) & 0xffff;
      case AddressMode.absoluteY:
        var add = (operand2 << 8) | operand1;
        return (add + _y) & 0xffff;
      case AddressMode.indirect:
      // TODO: Handle this case.
      case AddressMode.indexedIndirect: // LDA ($40,X)
        var add = operand1 + _x;
        var readByte0 = memory.getMem(add & 0xff);
        var readByte1 = memory.getMem((add + 1) & 0xff);
        return (readByte1 << 8) | readByte0;
      case AddressMode.indirectIndexed: // LDA ($40),Y
        var readByte0 = memory.getMem(operand1 & 0xff);
        var readByte1 = memory.getMem((operand1 + 1) & 0xff);
        var result = (readByte1 << 8) | readByte0;
        return (result + _y) & 0xffff;
    }
    return 0;
  }

Something to note here is that we receive the mode as an ordinal position, then we convert it to the enum via var modeAsEnum = AddressMode.values[mode]

So, at least our Case statement is easily readable by means of enums.

Now, we use this method in our evolving step() method:

  step() {
    var opCode = memory.getMem(pc);
    pc++;
    var insLen = CpuTables.instructionLen[opCode];
    var arg0 = 0;
    var arg1 = 0;
    if (insLen > 1) {
      arg0 = memory.getMem(pc);
      pc++;
    }
    if (insLen > 2) {
      arg1 = memory.getMem(pc);
      pc++;
    }
    var resolvedAddress = calculateEffectiveAddress(
        CpuTables.addressModes[opCode], arg0, arg1);
    switch (opCode) {
   ...
    }
  }
You will also see some simplifications here from the previous post. The instruction length lookup table now helps us find the argument bytes before hand, whereas in the previous post he had to do it with in the opCode switch within the case statement of the particular opcode. With this way, our opCode switch statement will not grow so drastically.

Implementing the Load and Store instructions

Let us now implement the Load and store instructions within our opCode Switch statement. Firstly, all our LDA instructions:

...
      /*
        Zero Page     LDA $44       $A5  2   3
        Zero Page,X   LDA $44,X     $B5  2   4
        Absolute      LDA $4400     $AD  3   4
        Absolute,X    LDA $4400,X   $BD  3   4+
        Absolute,Y    LDA $4400,Y   $B9  3   4+
        Indirect,X    LDA ($44,X)   $A1  2   6
        Indirect,Y    LDA ($44),Y   $B1  2   5+
    */
      case 0xa9:
        _a = arg0;
        _n = (_a & 0x80) != 0;
        _z = _a == 0;
      case 0xB5:
      case 0xAD:
      case 0xBD:
      case 0xB9:
      case 0xA1:
      case 0xB1:
        _a = memory.getMem(resolvedAddress);
        _n = (_a & 0x80) != 0;
        _z = _a == 0;
...
You will see I have introduced two new variables, _n and _z. These are boolean variables for the Negative and zero flags. To keep the discussion simple, I am not going to show you what is required to implement these variables. All I am going to mention, is that you need to follow the same process as we have implemented the accumulator (e.g. _a).

Next, let us implement both LDX and LDY:

 
...
/*
LDX (LoaD X register)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     LDX #$44      $A2  2   2
Zero Page     LDX $44       $A6  2   3
Zero Page,Y   LDX $44,Y     $B6  2   4
Absolute      LDX $4400     $AE  3   4
Absolute,Y    LDX $4400,Y   $BE  3   4+

 */
      case 0xA2:
        _x = arg0;
        _n = (_x & 0x80) != 0;
        _z = _x == 0;
      case 0xA6:
      case 0xB6:
      case 0xAE:
      case 0xBE:
        _x = memory.getMem(resolvedAddress);
        _n = (_x & 0x80) != 0;
        _z = _x == 0;

      /*
      LDY (LoaD Y register)
Affects Flags: N Z

MODE           SYNTAX       HEX LEN TIM
Immediate     LDY #$44      $A0  2   2
Zero Page     LDY $44       $A4  2   3
Zero Page,X   LDY $44,X     $B4  2   4
Absolute      LDY $4400     $AC  3   4
Absolute,X    LDY $4400,X   $BC  3   4+

       */
      case 0xA0:
        _y = arg0;
        _n = (_y & 0x80) != 0;
        _z = _y == 0;
      case 0xA4:
      case 0xB4:
      case 0xAC:
      case 0xBC:
        _y = memory.getMem(resolvedAddress);
        _n = (_y & 0x80) != 0;
        _z = _y == 0;
...
Finally, what remains for this post is to implement, STA, STX and STY:

...
      /*
        STA (STore Accumulator)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Zero Page     STA $44       $85  2   3
Zero Page,X   STA $44,X     $95  2   4
Absolute      STA $4400     $8D  3   4
Absolute,X    STA $4400,X   $9D  3   5
Absolute,Y    STA $4400,Y   $99  3   5
Indirect,X    STA ($44,X)   $81  2   6
Indirect,Y    STA ($44),Y   $91  2   6
         */
      case 0x85:
      case 0x95:
      case 0x8D:
      case 0x9D:
      case 0x99:
      case 0x81:
      case 0x91:
        memory.setMem(_a, resolvedAddress);

      /*
        STX (STore X register)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Zero Page     STX $44       $86  2   3
Zero Page,Y   STX $44,Y     $96  2   4
Absolute      STX $4400     $8E  3   4

         */
      case 0x86:
      case 0x96:
      case 0x8E:
        memory.setMem(_x, resolvedAddress);

      /*
      STY (STore Y register)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Zero Page     STY $44       $84  2   3
Zero Page,X   STY $44,X     $94  2   4
Absolute      STY $4400     $8C  3   4
       */
      case 0x84:
      case 0x94:
      case 0x8C:
        memory.setMem(_y, resolvedAddress);
    }
  }
...

The Test Program

To test our implemented instructions, we can use the following assembly program:

LDA #$40    A9 40
LDY #$06    A0 06
STA ($10),Y 91 10
LDX #$05    A2 05
STA ($12,X) 81 12
LDA #$F0
A binary dump of the program will look as follows:

You will notice that apart from our program, there is also set bytes at 0x10 and 0x17. This is to test the indirect address mode instructions in our program. Don't worry about creating a binary file of this dump. I will provide a github link to the source of the project at the end of the post.

The End Result

After running this program in our emulator, the memory dump looks as follows:


In the screenshot I have highlighted what have changed in memory when the program ran. This indicates the new instructions we have added to our emulator works more or less correctly.

In summary

In this post we added all the load and store instructions with all the associated addressing modes to our emulator.

In the next post we will continue to some more instructions to our emulator.

Before I wrap up this post, I would like to mention I have started to make the source of this evolving C64 Flutter emulator available on Github. Here is the link:


I have also created a tag for the source code as is for this post. In coming posts I will also create separate tags for those as well.

Until next time!