Wednesday, 3 December 2025

A Commodore 64 Emulator in Flutter: Part 14

Foreword

In the previous post we managed to interface the keyboard to our C64 Flutter emulator. With that implemented, we were able to enter a simple Basic program into our emulator and running it.

Now, my ultimate goal for writing this emulator, is to be able to run the game Dan Dare in our emulator, loading it from a tape image.

So, to achieve this end goal, the next goal would be for our emulator to be able to load a tape image. On a C64, loading from the tape rely heavily on the features of a CIA (Complex Interface Chip). The features tape loading rely on is connecting access the read head from the tape, timers and interrupts.

Up to now we have been mimicking some of the features of a CIA chip. The address range of the CIA chip is within DC00-DCFF. It immediately comes to mind that in the previous post we implemented two of the registers  of the CIA, DC00 and DC01 for keyboard access.

We also implicitly implemented a timer and interrupts in our emulator, interrupting the CPU every 1/60 of a second, so that the cursor can flash and keyboard entry could work. However, we blindly forced these interrupts just as a quick hack just to get the cursor and keyboard to work. We didn't even consider the values set in the CIA for setting the timer.

However, to implement tape loading we would not be able to get away with a quick hack 😀 We will need to emulate the CIA properly for this purpose.

So, in this post we will implement CIA emulation bit by bit. This will include revisiting our current keyboard and timer interrupt implementation (e.g. doing the 1/60 second interrupt), and implementing it properly with CIA implementation.

We will probably only get to tape emulation in the next post.

Enjoy!

Creating the CIA skeleton

Lets begin our journey by creating a CIA class just as a skeleton. This class will evolve over time to contain all the functionality that a CIA will contain:

class Cia1 {
  setMem(int address, int value) {
    print("setMem ${address.toRadixString(16)} ${value.toRadixString(16)}");
  }

  int getMem(int address) {
    print("getMem ${address.toRadixString(16)}");
    return 0;
  }
}
Here we do something interesting. Every write or read from the CIA address range we log. With this we can see which functionality is used and we can just implement the bare minimum functionality of the CIA chip.

With this exercise we also want to disable the hard coded interrupts happening every 1/60 second to avoid any potential side-effect with our CIA journey:

  step() {
/*
    if ((_cycles > 1000000) &&((_cycles % 16666) < 30) && (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n << 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
*/
    var opCode = memory.getMem(pc);
    pc++;
    var insLen = CpuTables.instructionLen[opCode];
    ...
  }
Next we need to make an instance of this class and inject into our Memory class:

  C64Bloc() : super(InitialState()) {
    memory.setKeyInfo(this);
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      Cia1 cia1 = Cia1();
      memory.setCia1(cia1);
      ...
    }
    ...
  }
We modify the actual Memory class like this:

class Memory {
...
  late final Cia1 cia1;
...
  setCia1(Cia1 cia1) {
    this.cia1 = cia1;
  }
...
  setMem(int value, int address ) {
    if ((address >> 8) == 0xDC) {
      cia1.setMem(address, value);
    } else {
      _ram.setInt8(address, value);
    }
  }

  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else if ((address >> 8) == 0xDC ) {
      return cia1.getMem(address);

    /*else if (address == 0xDC01) {
      return keyInfo.getKeyInfo(_ram.getUint8(0xDC00));*/
    } else {
      return _ram.getUint8(address);
    }
  }
}

So, every time an address starts with DC we send this access to the CIA instance. You will also see that I have commented out the explicit access to the DC01 register, which we added in the previous post for keyboard access. We will implement this functionality at a later stage into our CIA class.

Now, let us start the emulator and watch the log output:

setMem dc0d 7f
setMem dc00 7f
setMem dc0e 8
setMem dc0f 8
setMem dc03 0
setMem dc02 ff
setMem dc04 95
setMem dc05 42
setMem dc0d 81
getMem dc0e
setMem dc0e 11
setMem dc04 25
setMem dc05 40
setMem dc0d 81
getMem dc0e
setMem dc0e 11
So, let us quickly see what is going on here. With the write to DC0D, we disable all interupts going to the CPU.

The write to address DC00 is for the keyboard stuff, which we don't worry about at the moment.

Next we see the value 8 written to registers DC0E and DC0F. This puts timers A and B in One shot mode. 

Lets skip a couple of memory writes and get to the writing to locations dc04 and dc05. These are registers for setting the duration of timerA. Each count is a count of your 1MHz clock which also drives the 6510. Dc04 is the lo byte of the value and DC05 the high byte. So, this is 4295 hexadecimal which translates to 17045, which is close to that hard coded count we used previously for triggering an interrupt every 1/60 second.

Next, we do an assignment to location DC0D. This is the interrupt register. We see in the value assigned that the least significant bit is set. This is the value that controls interrupts from Timer A. We also see that the most significant bit is set in the assigned value. If this value is a 1 it means enable all interrupts that is a one is this value byte. So, in this case we have enabled interrupts from timer A.

Finally, we see a value that is assigned twice to register DC0E. With the assignment, two things are happening. Firstly bit 4 is set, which means force load the value from the timer from the latch, which in our case would be the hex value 4295. The second thing that happens with bit 0 that is set, is that timer A is finally started.

Something else is also happening subtly. Previous I mentioned we are setting bit 3 to 1, meaning it was in one shot mode. Now, however, we are setting this bit to a zero, which means that timer A will operate in continuous mode. This means after the timer has lapse it will automatically restart, which means we will get periodic interrupts every 1/60 second.

Implementing the Alarm System

With the skeleton implemented for the CIA chip, we should start implementing some meat for it. We will start with timer A. 

Now, timer A is very reliant on the number of cycles the CPU executed. There are other operations that is also dependant on the number of CPU cycles executed, like tape loading, drawing pixels at the right moment on the screen and SID sound generation.

I wrote a number of C64 emulators for other programming languages. I must admit, for all these emulators, I would would do all these operations that is dependant on CPU cycles executed, on every CPU instruction executed. In the beginning, when I just add timers or tape interrupts, I didn't really see issues.

However, as I added more of these operations dependant on CPU cycles executed, I saw performance gradually worsening, especially when I added more of the VIC-II operations.

Now, what I experienced isn't really something new. There is actually a computer science term for this trying to solve the issue, which is Loop fission. The following Wikipedia article explains a bit more about Loop fission:


Basically, when you have a loop where you do a lot of things in a loop iteration, one issue that pops up is that you have more cache misses, and your CPU needs to fetch data from slower RAM more often. By splitting the loop into more separate loops cache misses should be reduced and therefore improve performance.

I have digged a bit into the source code of the Vice Emulator and overall they also overall try to break things into separate loops. They have the whole concept of alarms. For instance everything VIC-II scan line is 63 cycles. So, instead of rendering a bit of a line after CPU instruction, they set an alarm that will trigger 63 cycles into the future. So, with every CPU instruction execution, it will check if 63 cycles has passed. Only when the 63 cycles has passed, then you execute an alarm handler that will render the full line.

Of course, during the course of the 63 cycles, something might change like the border color, in which the line will not only show one border color. In such cases when writing to such a register, one should keep record when the color change.

Lets start to create a Alarm subsystem for our emulator. We start with a brief outline:

class Alarms {
  final LinkedList<Alarm> _alarmList = LinkedList<Alarm>();

  Alarms();

  Alarm addAlarm(Function(int remainder) callback) {
    var alarm = Alarm._(this, callback);
    _alarmList.add(alarm);
    return alarm;
  }

}
So, here we have a class containing all our alarms. Internally all the alarms is store in a linked list, which is a data structure in Dart. We will visit this in a while.

There is also a method for adding a alarm with a callback, so when the alarm has expired you can call the callback to do some stuff. The remainder parameter indicates how much cycles we have gone over the alarm threshold when a cpu instruction has executed.

Lets now focus a bit on the LinkedList story. So, we have a declaration LinkedList<Alarm>(). LinkList is one of Flutter's build in classes which is a generic, which you need to type when you make an instance. In this case we are saying we will have a LinkedList containing instances of Alarm.

Now usually with generics, You can define Alarm in anyway you want. However, with a LinkedList, things are a bit more tricky, because every node needs to point to the next and previous node. This is just how a LinkedList is implemented.

Luckily you don't need to worry about implementing all this yourself. You can just let our Alarm class extends LinkedListEntry, then all this will happen automatically:

final class Alarm extends LinkedListEntry<Alarm> {
  late final Alarms _alarms;
  late final Function(int remainder) _callback;

  Alarm._(Alarms alarms, Function(int remainder) callback ) {
    _alarms = alarms;
    _callback = callback;
  }

}

Let us now add some more meat to our alarm class:

final class Alarm extends LinkedListEntry<Alarm> {
  var _targetClock = 0;
...
  setTicks(int ticks) {
    _targetClock = _alarms.getCurrentCpuCount() + ticks;
  }

  getRemainingTicks() {
    return _targetClock - _alarms.getCurrentCpuCount();
  }

  getTargetClock() {
    return _targetClock;
  }

  processAlarm(int remainder) {
    _callback(remainder);
  }
}
Basically I have added some methods for keeping track of how far we are from triggering a alarm. The processAlarm will be invoked when the alarm is triggered.

Now, let us add some meat to our Alarms class:

class Alarms {
  final LinkedList<Alarm> _alarmList = LinkedList<Alarm>();
  int _cpuCount = 0;

  Alarms();

  Alarm addAlarm(Function(int remainder) callback) {
    var alarm = Alarm._(this, callback);
    _alarmList.add(alarm);
    return alarm;
  }

  reAddAlarm(Alarm alarm) {
    _alarmList.add(alarm);
  }

  int getCurrentCpuCount() {
    return _cpuCount;
  }

  processAlarms(int cpuCycles) {
    _cpuCount = cpuCycles;
    for (Alarm item in _alarmList) {
      if (item.getRemainingTicks() <= 0) {
        item.processAlarm(item.getRemainingTicks());
      }
    }
  }
}
The key method added here is processAlarms(). This method loops through the alarms, checking which expired and then calling its callback.

Another interesting method is reAddAlarm(). It will happen often that we will stop a timer, at which we will remove it from the alarms queue, so it isn't triggered again. However, there might be a case where we want to start the timer again, at which we will use reAddAlarm(), to add it back to the queue so it is evaluated again for expiry.

Wiring everything together

With all the building blocks created in the previous section, lets now put them together. In C64Bloc let us do some initialisation:

class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
  final Memory memory = Memory();
  final List<int> matrix = [0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff];
  final FocusNode focusNode = FocusNode();
  late final Cpu _cpu = Cpu(memory: memory);
  late final Alarms alarms = Alarms();
  type_data.ByteData image = type_data.ByteData(200*200*4);
  int dumpNo = 0;
  int frameNo = 0;
  Timer? timer;
...
  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      Cia1 cia1 = Cia1(alarms: alarms);
      cia1.setKeyInfo(this);
      memory.setCia1(cia1);
      memory.populateMem(basicData, characterData, kernalData);
      _cpu.setInterruptCallback(() => cia1.hasInterrupts());
...
  }
...
}
I have added a field for our alarms. I am also now injecting an Cia1 instance into our memory.

In our CPU class we also now use a InterruptCallBack, which our CPU class will call to see if any interrupts has occured. Our Cia1 instance will provide this info.

In our main event processing loop, we also make a small change:

    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int start = DateTime.now().millisecondsSinceEpoch;
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
            alarms.processAlarms(_cpu.getCycles());
          } while (_cpu.getCycles() < targetCycles);
...
});
    });

After every CPU we process the alarms with the current cpu cycles.

Expanding the CIA1 class

Earlier we created a sceleton for the CIA1 class. We will now expand this class further.

As usual we start with some initialisation:


class Cia1 {
  int timerAlatchLow = 0xff;
  int timerAlatchHigh = 0xff;
  int timerAvalue = 0xffff;
  Alarms alarms;
  Alarm? timerAalarm;
  bool timerAstarted = false;
  bool timerAoneshot = false;
  int registerE = 0;
  int register0 = 0;
  bool timerAinterruptEnabled = false;
  bool timerAintOccurred = false;
  late final KeyInfo keyInfo;


  Cia1({required this.alarms});

  setKeyInfo(KeyInfo keyInfo) {
    this.keyInfo = keyInfo;
  }
 ...
}
The meaning of these private variables will became clear in a bit.

Next, let us implement the following method:

  updateTimerA() {
    if (!timerAstarted) {
       return;
    }
    if (timerAalarm != null) {
      timerAvalue = timerAalarm!.getRemainingTicks();
    }

  }

timerAValue is the value of count down timerA in the CIA. To increase locality, we dont update this value with the execution of every CPU instruction. Instead, we wrote this method that updates the value when the CPU reads the value of this register.

Next we add these methods:

  hasInterrupts() {
    if (timerAintOccurred && timerAinterruptEnabled) {
      return true;
    } else {
      return false;
    }
  }

  processTimerAalarm(int remaining) {
    // Do interrupt
    timerAintOccurred = true;
    if (timerAoneshot) {
      timerAalarm?.unlink();
      timerAstarted = false;
      return;
    }
    timerAalarm!.setTicks((timerAlatchLow | (timerAlatchHigh << 8)) + remaining);
  }

Here we deal with when the timer expire and we set interrupts. We remove the timer from the alarm list if it is oneshot. Otherwise we schedule the running of the timer again.

Finally, let us add methods for reading and writing to the CIA registers:

  setMem(int address, int value) {
    print("setMem ${address.toRadixString(16)} ${value.toRadixString(16)}");
    value = value & 0xff;
    address = address & 0xf;
    switch (address) {
      case 0x0:
        register0 = value;
      case 0x4:
        timerAlatchLow = value;
      case 0x5:
        timerAlatchHigh = value;
      case 0xD:
        if ((value & 0x80) != 0) {
          timerAinterruptEnabled = ((value & 1) == 1) ? true : timerAinterruptEnabled;
        } else {
          timerAinterruptEnabled = ((value & 1) == 1) ? false : timerAinterruptEnabled;
        }
      case 0xE:
        var startTimerA = ((value & 1) == 1) ? true : false;
        var forceTimerA = ((value & 16) != 0) ? true : false;
        updateTimerA();
        if (forceTimerA) {
          timerAvalue = timerAlatchLow | (timerAlatchHigh << 8);
        }
        var startingTimerA = startTimerA & !timerAstarted;
        var stoppingTimerA = !startTimerA & timerAstarted;
        var alreadyRunningTimerA = startTimerA && timerAstarted;
        if (startingTimerA || (alreadyRunningTimerA && forceTimerA)) {
          // schedule timer on alarm
          timerAalarm ??= alarms.addAlarm( (remaining) => processTimerAalarm(remaining));
          if (timerAalarm!.list == null) {
            alarms.reAddAlarm(timerAalarm!);
          }
          timerAalarm!.setTicks(timerAvalue);
          // set timer as started
        } else if (stoppingTimerA) {
          //unschedule timer A
          timerAalarm!.unlink();
        }
        timerAoneshot = (value & 8) != 0;
        timerAstarted = startTimerA;
        registerE = value;
      default:
        // throw "Not implemented";
    }

  }

  int getMem(int address) {
    print("getMem ${address.toRadixString(16)}");
    updateTimerA();
    address = address & 0xf;
    switch (address) {
      case 0x0:
        return register0;
      case 0x1:
        return keyInfo.getKeyInfo(register0);
      case 0x4:
        return timerAvalue & 0xff;
      case 0x5:
        return timerAvalue >> 8;
      case 0xD:
        if (timerAintOccurred) {
          timerAintOccurred = false;
          return 0x81;
        } else {
          return 0;
        }
      case 0xE:
        var result = registerE & 0x06;
        result = result | (timerAstarted ? 1 : 0);
        result = result | (timerAoneshot ? 8 : 0);
        return result;
    }
    return 255;
  }

You will see each time we read from the CIA1 we update the timer. In the write function we also adjust the alarms accordingly if we chane the state of the times.

Changes to the CPU class

There is finally just a small change we need to do to our CPU. Previously in our CPU we hardwired an interrupt that happened every 1/60th of a second. However, now we have implemented an CIA class, we need to change how interrupts works.

Here is the highlighted changes:


class Cpu {
...
  late final Function() _interruptCallback;
...
  setInterruptCallback(Function() callback) {
    _interruptCallback = callback;
  }
...
  step() {
    if (_interruptCallback() & (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n <<< 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
...
  }
...
}
Now, we call the interruptCallBack, which basically tie back to the CIA1 class we created. Also, we only invoke an interrupt only when the Inteerupt disable flag is not set.

In Summary

In this post we introduced the CIA as a separate class. We also removed the hardcoded mechanism which trigger an interrupt every 1/60th of a second, and rather let the CIA schedule the interrupts as programmed by machine language.

In the next post we will start to implement tape loading from a raw tape image.

Until next time!

Thursday, 22 May 2025

A Commodore 64 Emulator in Flutter: Part 13

Foreword

In the previous we managed to boot the C64 system with a screen showing the contents of screen memory in real time. It booted with the welcome message and a flashing cursor.

In this post we will provide some keyboard interfacing with our C64 emulator. We will approach this in a very experimental fashion, exploring how Flutter itself work with keyboard interfacing in a app. Then we will try to see if we can get keyboard interfacing to work in our app, and finally see if our emulator can work with the keyboard.

Enjoy!

KeyboardListener in Flutter

What we want for our emulator is basically to tell when a key is held down, and when it is released. Flutter provides this for us via a KeyboardListener. From the Flutter documentation it is not so straightforward on how to use this, so I looked around for a worked example on the Internet and found the following:

https://medium.com/@wartelski/how-to-flutter-keyboard-events-keyboard-listener-in-flutter-web-0c36ab9654a9

The following snippet is the core of the example:

With this example we can basically catch it when a key is down. Now all is well in this example, except for we have a final variable for _focusNode. This is, however, only a thing we can do with a StatefulWidget. In our case, however, we are within a StatelessWidget, where we cannot do such things.

In our case we would place the focusNode in our Bloc. Probably not the best place if one think about separation of concerns, but for now it is the best place if we want to keep a single instance of FocusNode alive. So, we do the following changes:

class C64Bloc extends Bloc<C64Event, C64State> {
  final Memory memory = Memory();
  final FocusNode focusNode = FocusNode();
...
}
And now we go further and wrap our RawImage in a KeyboardListener:

...
           } else if (state is RunningState) {
              return KeyboardListener(
                focusNode: context.read<C64Bloc>().focusNode,
                autofocus: true,
                onKeyEvent: (event) => {
                  if (event is KeyDownEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      print("The m key is pressed!!")
                    }
                  } else if (event is KeyUpEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      print("The m key is released!!")
                    }
                  }

                },
                child: RawImage(
                    image: state.image, scale: 0.5),
              );
            } else {
...
So, here we listen for the "M" key and write out to the console when this key is pressed and released.

Simulating a key press in our emulator

Next, let us see we can simulate a key press in our emulator. To figure out how let us dig a bit into how the keyboard is implemente in hardware.

Firstly, a keyboard is arranged a matrix of rows and columns, and where a row and column meets, there is a key switch. If the switch is pushed, it will short the row to ground. To see if a switch is pressed is a two step process. You need to energised each column in turn and see which columns are shorted to ground.

Firstly, to get an idea how the matrix of a C64 is arranged, the following diagram is helpful:

Now, the big question is which memory locations do we need to manipulate and read to see which key was pressed.

The following web link provide us with a memory map which will aid in finding these memory locations:

Scrolling down, we eventually find the place where it is dealt with the keyboard:


As you can see, both these ports is used by the joystick ports and the keyboard. The first piece of info that is useful for us, is the following at memory location DC00:

  • Bit #x: 0 = Select keyboard matrix column #x.

So, this is actually where we energise one or more columns. In the matrix diagram, this is actually the parts labeled A - H. Each of these are assigned a bit number (0 - 7) in the byte we write to this port.

The next piece of useful info is at memory location DC01:

  • Bit #x: 0 = A key is currently being pressed in keyboard matrix row #x, in the column selected at memory address $DC00.

So, we select one or more columns in location DC00 and within the selected column, we can read via location DC01 which rows in that column is selected.

Let us now see how we can emulate a keypress in our emulator. At this point we are able to catch keys from the keyboard with a KeyboardListener. In our KeyboardListener we can basically trigger events for which we listen for in our Bloc.

First let us define a event class which we will trigger:

class KeyC64Event extends C64Event {
  final bool keyDown;
  KeyC64Event({required this.keyDown});
}
So, we will either trigger an event with keyDown = true, when a key is pressed, or an event with keyDown = false, when a key is released.

With this in mind, let us modify our KeyboardListener:

            } else if (state is RunningState) {
              return KeyboardListener(
                focusNode: context.read<C64Bloc>().focusNode,
                autofocus: true,
                onKeyEvent: (event) => {
                  if (event is KeyDownEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      context.read<C64Bloc>().add(KeyC64Event(keyDown: true))
                    }
                  } else if (event is KeyUpEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      context.read<C64Bloc>().add(KeyC64Event(keyDown: false))
                    }
                  }

                },
                child: RawImage(
                    image: state.image, scale: 0.5),
              );
            } else {
Next, let us listen for these events in our Bloc:

class C64Bloc extends Bloc<C64Event, C64State> {
...
  bool keyDown = false;
...
  C64Bloc() : super(InitialState()) {
...
    on<KeyC64Event>((event, emit) {
      keyDown = event.keyDown;
    });
...
  }
...
}
So, within our Bloc, keyDown is a variable keeping track of whether the key is up or down, which in this case is the state of the M key on our keyboard. We will make use of this variable to simulate a key stroke in our emulator.

Now, the action simulation of a key press should happen in our Memory class when a read is done from address DC01, we should consider which column is enable via address DC00, and see if in the column enabled, that there is indeed one of the keys held down and send back a value that reflects this.

So, we have a situation here where Memory wants some info from our Bloc class in which it lives, but we dont want to provide Memory for with all the state of the Bloc class. To achieve this we need to create an interface with methods returning the info the Memory needs.

Here is the interface:

abstract class KeyInfo {
  int getKeyInfo(int column);
}
And now let us implement the interface in our Bloc:

class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
...
  @override
  int getKeyInfo(int column) {
  }
...
}
So, given the list of columns energised, we return the rows. Now, as an exercise, lets say if we press the M key on the keyboard, which we currently check for in our KeyBoardListener, we want our C64 emulator to also show an M.

So, let us look at the keyboard matrix diagram again to see where the M key is located. The M key is located at column E and row 4. So with the bit counting starting at column A, the bit number of column E is 4.  So we are interested in column bit 3 and row bit 4. 

With this in mind, Let us give getKeyInfo() some meat:

  @override
  int getKeyInfo(int column ) {
    if (!keyDown) {
      return 0xff;
    }
    if ((column & 0x10) == 0) {
      return 0xef;
    } else {
      return 0xff;
    }
  }
One thing to remember here is that when working with the keyboard matrix, we don't work with the default assumption that one means active, but the other way around. So a zero means in the column byte that a certain column is energised, and a zero in the row byte means that the switch for that bit position is held down.

With all this written, let us make our Memory class make use of it:

class Memory {
...
  late final KeyInfo keyInfo;
...
  setKeyInfo(KeyInfo keyInfo) {
    this.keyInfo = keyInfo;
  }
...
}
So, we can pass our keyInfo object to our Memory class. We assign the keyInfo when our Bloc class is instantiated:
class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
...
  C64Bloc() : super(InitialState()) {
    memory.setKeyInfo(this);
...
  }
...
}
Finally, let us use keyInfo our Memory class:
...
  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else if (address == 0xDC01) {
      return keyInfo.getKeyInfo(_ram.getUint8(0xDC00));
    } else {
      return _ram.getUint8(address);
    }
  }
...
So, when address DC01 is read from our Memory we invoke getKeyInfo and passing it the contents of memory location DC00. At the moment we will fetch location DC00 from RAM.

Now, when we build and run, and press the M key a couple of times, the screen looks like as follows:

We managed to implement the implement a simple key press!

Implementing the full keyboard

Let us now look at implementing a full keyboard, or at least sufficient keys, like the alphabet, digits and some symbols, just to type a simple basic program within our emulator.

Up to now we kept track only of a single whether it is down via keyDown, but now we need to keep track of whether several keys are held down. So, we need like kind of a boolean matrix, or to put it more plainly, an array of eight bytes. Each column is a byte:

  final List<int> matrix = [0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff];
I mentioned earlier that in a real C64, a zero means the key is selected. So, this array filled with the value 0xff's, means no key is held down at the moment.

Previously in our Main class, we just looked for the M key being pressed and released, and then pass this event to our Bloc class. Obviously now we will need to remove this explicit check for the M key and pass all key events to our Bloc class. This necessitates us to modify our KeyC64Event class to say which key was pressed not if a key was pressed:

class KeyC64Event extends C64Event {  
  final bool keyDown;
  final LogicalKeyboardKey key;
  KeyC64Event({required this.keyDown, required this.key});
}
With this in place our Bloc class will receive indeed a key code, but what make only sense in the Flutter world. We need a kind of a lookup table or a map to convert a Flutter keyboard scan code to a C64 keyboard scan code. So for this purpose we create the following map, preferably in a separate file:

Map<LogicalKeyboardKey, int> keyMap = Map.unmodifiable({
  LogicalKeyboardKey.keyA : 0x0A,
  LogicalKeyboardKey.keyB : 0x1C,
  LogicalKeyboardKey.keyC : 0x14,
  LogicalKeyboardKey.keyD : 0x12,
...
  LogicalKeyboardKey.digit0 : 0x23,
  LogicalKeyboardKey.digit1 : 0x38,
  LogicalKeyboardKey.digit2 : 0x3B,
  LogicalKeyboardKey.digit3 : 0x08,
  LogicalKeyboardKey.digit4 : 0x0B,
  LogicalKeyboardKey.digit5 : 0x10,
  LogicalKeyboardKey.digit6 : 0x13,
  LogicalKeyboardKey.digit7 : 0x18,
  LogicalKeyboardKey.digit8 : 0x1B,
  LogicalKeyboardKey.digit9 : 0x20,
...
  LogicalKeyboardKey.space : 0x3c,
  LogicalKeyboardKey.shiftLeft : 0x0F,
  LogicalKeyboardKey.enter : 0x01,
...
});
With this map cretaed, we can now modify our listener a bit for the event KeyC64Event:

    on<KeyC64Event>((event, emit) {
      int c64KeyCode = keyMap[event.key] ?? 0;
      int col = c64KeyCode >> 3;
      int row = 1 << (c64KeyCode & 7);
      if (!event.keyDown) {
        matrix[col] |= row;
      } else {
        matrix[col] &= ~row;
      }
    });
We start off by looking up the C64 scancode, given the Flutter key code. Now bit 5-3 of the scan code is the column and bits 2-0 is the row.

In the if statement, if the key is released, we OR the bit position with. Please it is pressed, we mask off the bit position.

Now we need to modify the method getKeyInfo, which is the method our Memory class calls when reading Address DC01. When calling this method, we tell the method which columns needs to be considered. Potentially two ore more columns can be selected, in which case we need to do a kind of a OR operation, to reduce the selected columns to one.

We can express this reducing in a simple for loop:

  @override
  int getKeyInfo(int column ) {
    int result = 0xff; // Accumulator for the OR'ed numbers

    for (var row in matrix) {
      if ((column & 1) == 0) {
        result &= row; 
      }

      column = column >> 1;
    }

    return result;
  }
We are shifting the column right eveyrtime, looking everytime if the lowest bit is zero. If it is zero, we know the column is selected. We and all the selected columns together. If, for any row position in a selected column there is a zero, then the final value for that bit position would be zero. A zero means there was one or more keys selected in that bit position for the selected columns.

Now, let us see if we can write a simple program, with the keyboard input enabled:

Next, let us run the program:
We have a working program!

In Summary

In this post we implemented keyboard input and write a small test program.

In the next post we will start implementing tape loading, from a tape image.

Until next time!



Monday, 28 April 2025

A Commodore 64 Emulator in Flutter: Part 12

Foreword

In the previous post we successfully ran the Klaus Dormann Test Suite.

In this post we will be trying to boot the C64 system with its ROM's.

Enjoy!

Inserting the ROMS

Inserting the ROM's... Now that sounds like plugging and unplugging game cartridges 😂. In our case, this means loading the C64 ROM images from files into memory, and making sure our emulated CPU can access the contents.

We start by dumping the ROM images into the asset folder:


Usually for the C64 ROMS you get for download on the internet, the file names have always some version numbers in it. In my case, I just gave them simple names. Also, notice that I have removed the file program.bin we used in previous posts.

  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      memory.populateMem(basicData, characterData, kernalData);
...
So, we load the different ROM's, waiting for the loading of each file to complete, and then going to the next file for loading.

Now, you might notice that from previous posts, that we now pass more ROMS to memory.populateMem. So let us delve a bit deeper in our Memory class to see what changes are required:

...
  late type_data.ByteData _basic;
  late type_data.ByteData _character;
  late type_data.ByteData _kernal;
...
  final type_data.ByteData _ram = type_data.ByteData(64*1024);
...
  populateMem(type_data.ByteData basicData, type_data.ByteData characterData,
      type_data.ByteData kernalData) {
    _basic = basicData;
    _character = characterData;
    _kernal = kernalData;
  }
...
Fairly straightforward. Each ROM that is passed through, we store in a variable.

Something else we do, is do define a 64KB array that will act as our RAM, the significant characteristic of the C64.

So, next, let us add some address mapping:

  setMem(int value, int address ) {
    _ram.setInt8(address, value);
  }

  int getMem(int address) {
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else {
      return _ram.getUint8(address);
    }
  }

For memory writes, we write straight to the ram array. For reads, we do it the usual C64 setup:

  • Addresses A000-BFFF: We read from basic ROM
  • Addresses E000-EFFF: We read from Kernal ROM
  • All other addresses we read from RAM

Booting the C64 System

We are now close to booting the C64 system with all its ROM's.

First things first. Our periodic timer current runs once every second, executing 1 000 000 millions cycles worth of CPU instructions. However, we want to reduce to a 60th of a second, so that later on we can draw a frame every time our time executes, yielding 60 frames a second, which is the frame rate of a native C64:

    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
      });
    });

Every time we also execute 16666 cycles, which is the number of CPU cycles in a 1/60th of a second.

To boot the C64 ROM's, is actually fairly straightforward. You basically set the program counter to the value of the reset vector. For his we just we just create the following method:

  reset() {
    pc = memory.getMem(0xfffc) | (memory.getMem(0xfffd) <<< 8);
  }
So, here we populate the program counter with the reset vector at adress FFFC and FFFD.

We still need to call this method. We do this just after we have loaded all the ROM's:

  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      memory.populateMem(basicData, characterData, kernalData);
      _cpu.reset();
...
Now, we can finally boot the C64 System. We wait for a minute, and then hit stop to view the registers:


We see the program stabilise at address FF61. Let us have a look at the Kernal disassembly listing what is going on at this address:
 
As seen here, we get stuck in a loop with the memory address D012 not changing. We can expect that such a thing can happen at the moment, because with our current emulator setup that address will write and read to raw RAM, and thus nothing will happen.

In reality D012 maps to the VIC-II display registers and provide info on which rasterline on the screen we are currently at. It is quite an undertaking to implement such a raster counter, so for now, let us see if we can quickly hack together something, so that the Address D012, can just change sometimes, just to get past that loop. Here is my quick hack:

  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else {
      return _ram.getUint8(address);
    }
  }

So, the hack is simply just a counter that keeps count of the number of reads, and we look at bit 9 of the counter. If it is set, we return a 1, otherwise a zero. In effect we will have a 1 for about a thousand counts, and then a zero for another thousand counts.

Let us now see where our program counter lands. This time it lands at E5D4. Lets look again at the disassembly listing for this address:


Here it seems we are in a waiting loop, waiting for the enter key to be pressed on the keyboard. I think this is a pretty decent place for our emulator to be and probably means that all initialisation has been completed, and we should have the welcome message in screen memory.

We want to check if the welcome message is in screen memory, but our debug dump current just show the first two pages of memory. We could, however, inspect the ram array in debug mode in Intellij.

So, startup the emulator in debug mode and press the play button to let the C64 system run at full speed. Wait for about a minute and then put a breakpoint on the first line of the getMem method in our Memory class. With the system running at full speed, that breakpoint will be hit almost instantaneously.

Open up an evaluate window and enter the following:


Here we inspect address 1024 of screen memory, which is the first byte of it. In this case the value is 32, which is a space. Inspecting addresses further in screen memory will reveal the welcome message.

In the next section we will render the contents of the screen at real time.

Rendering screen memory

We will now try and render screen memory in real time, showing a display similar to the C64 in text mode.

We ultimately need a mechanism that would allow us to work efficiently with image data on a pixel level. Flutter ultimately provide it to us via the RawImage widget, together with ui.Image with which you can work with an array of RGBA values.

Let us unpack this a bit. Let us start working with the raw array of RGBA values, where we will produce the frame for display, based on the screen memory and the character ROM.

Both the character rom and screen memory is present in our Memory class, so for now we will do the frame rendering in that class.

Firstly, let us define the byte buffer we are going to use over and over again:

class Memory {
...
    final type_data.ByteData image = type_data.ByteData(320*200*4);
...
}
So, as we can see, we have a resolution 3200x200, which is the resolution of a real C64 screen. We multiply the end result by 4, because each pixel is bytes in our buffer, one byte each for red, blue green and the alpha channel.

Next, let us write method for rendering a screen to the byte array:

  type_data.ByteData getDisplayImage() {
    const rowSpan = 320 * 4;
    for (int i = 0; i < 1000; i++ ) {
      var charCode = _ram.getUint8(i + 1024);
      var charAddress = charCode << 3;
      var charBitmapRow = (i ~/ 40) << 3;
      var charBitmapCol = (i % 40) << 3;
      int rawPixelPos = charBitmapRow * rowSpan + charBitmapCol * 4;
      for (int row = /*charAddress*/ 0 ; row < /*charAddress +*/ 8; row++ ) {
        int bitmapRow = _character.getUint8(row + charAddress);
        int currentRowAddress = rawPixelPos + row * rowSpan;
        for (int pixel = 0; pixel < 8; pixel++) {
          if ((bitmapRow & 0x80) != 0) {
              image.setUint32(currentRowAddress + (pixel << 2), 0x000000ff);
          } else {
              image.setUint32(currentRowAddress + (pixel << 2), 0xffffffff);
          }
          bitmapRow = bitmapRow << 1;
        }
      }

    }
    return image;
  }

So, here we loop through all thousand characters codes in screen memory and rendering everyone. Each character code is actually an index into character ROM, every character is its own 8x8 pixel bitmap.

Now, this method is invoke everytime when our perioc timer runs:

...
import 'dart:ui' as ui;
...
   on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int start = DateTime.now().millisecondsSinceEpoch;
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
          ui.decodeImageFromPixels(memory.getDisplayImage().buffer.asUint8List(), 
             320, 200, ui.PixelFormat.bgra8888, setImg);
      });
    });
ui.decodeImageFromPixels is a menthof within the dart:ui library of flutter. It will create an Image object from a pixel buffer, which in this case is the rendered screen buffer.

We also pass ui.PixelFormat.bgra8888 as a parameter, indicating our buffer is in the format with byte each for red, green, blue, green and alpha.

We also pass a callback method, setImg in this case, which will be called once we have the generated Image object.

So, let us implement this callback method:

    void setImg(ui.Image data) {
      emit(RunningState(image: data, frameNo: frameNo++));
    }
Here you can see we are emitting the image in a state object, so our BlocBuilder can pick up the change and render the image. You will also notice that we have a frameNo Property that we modify with each new image, so our BlockBuilder can easily pick up the change.

You will recall that from previous posts, that we did define RunningState previously, which we applied changes to now. Here is the revised version:

class RunningState extends C64State {
  RunningState({required this.image,
    required this.frameNo});

  final int frameNo;
  final ui.Image image;
  @override
  List<Object> get props => [frameNo];
}
Finally, let us modify our BlocBuilder:

...
        body: BlocBuilder<C64Bloc, C64State>(
          builder: (BuildContext context, state) {
            if (state is InitialState) {
              return const CircularProgressIndicator();
            } else if (state is DataShowState) {
              return Column(
                children: [
                  Text(getRegisterDump(state.a, state.x, state.y, state.n,
                      state.z, state.c, state.i, state.d, state.v, state.pc)),
                  Text(
                    getMemDump(state.memorySnippet),
                    style: const TextStyle(
                      fontFamily: 'RobotoMono', // Use the monospace font
                    ),
                  ),
                ],
              );
            } else if (state is RunningState) {
              return RawImage(
                  image: state.image, scale: 0.5);
            } else {
              return const CircularProgressIndicator();
            }
          },
        ),
...
So, if the state is RunningState, we return a RawImage widget, which will be displayed on the screen. We pass the image in the state to the RawImage widget. We also use a scale of 0.5, with which we basically doubles the displayed size. The native resolution of 320x200 of a C64 frame display very small on a modern display, so at least with the scale, it can appear bigger.

With everything coded we can now give it a test run. The startup sequence appear to take more or less the same time as a real C64, and eventually the welcome screen appear:

We are making progress, but still, there is no flashing cursor.

Getting the cursor to flash

Let us see if we can get the cursor to flash. 

If you go down the bowls of the C64 system, you will found that the core of a standard C64 system that just started up, is that there is a timer interrupt every 60th of a second. This interrupt does a couple of things, like checking if any key was pressed or released and updating the status of the cursor.

So, let us see if we we can put a hack together, that off the bat we just force an interrupt every 60th of a second, without worrying for now to implement emulation of the full CIA chip with a timer.

The easiest way is in the step() method of our CPU class:

  step() {
    if ((_cycles > 1000000) &&((_cycles % 16666) < 30) && (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n << 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
...
  }
So, we wait for a second before triggering interrupts in 1/60 second intervals. With the change, the cursor actually flashes:

In Summary

In this post we managed to boot the C64 system with all its ROMs and managed to render screen memory in real time, showing the welcome message and the flashing cursor. 

The source code for post is available in the following Github tag: https://github.com/ovalcode/c64_flutter/tree/c64_flutter_part12

In the next post we will add some keyboard interaction with our emulator.

Until next time!


Wednesday, 9 April 2025

A Commodore 64 Emulator in Flutter: Part 11

Foreword

In the previous post we ran the Klaus Dormann Test Suite on our emulator. In this process we found a couple of issues with our emulator. We fixed a couple of issues, but found a couple of more issues we still need to fixed.

In this post we will look at the remaining issues. Solving these remaining issues wasn't so much of a deal at all, so this post will be shorter normal.

The remaining fixes

One of the major issues I found while running the Klaus Dormann Test Suite on my emulator, was some incorrect values for some of the CPU data tables. This include some instructions having the incorrect address mode and incorrect instruction lengths.

The other issue I experienced, was failed test cases because decimal mode wasn't implemented. Implementing Decimal mode is fairly straightforward. We start with implementing the following methods:

  int adcDecimal(int operand) {
     int l = 0;
     int h = 0;
     int result = 0;
     l = (_a & 0x0f) + (operand & 0x0f) + _c;
     if ((l & 0xff) > 9) l += 6;
     h = (_a >> 4) + (operand >> 4) + (l > 15 ? 1 : 0);
     if ((h & 0xff) > 9) h += 6;
     result = (l & 0x0f) | (h << 4);
     result &= 0xff;
     _c = (h > 15) ? 1 : 0;
     _z = (result == 0) ? 1 : 0;
     _n = 0;
     _v = 0;
     return result;
   }
 
   int sbcDecimal(int operand) {
     int l = 0;
     int h = 0;
     int result = 0;
     l = (_a & 0x0f) - (operand & 0x0f) - (1 - _c);
     if ((l & 0x10) != 0) l -= 6;
     h = (_a >> 4) - (operand >> 4) - ((l & 0x10) != 0 ? 1 : 0);
     if ((h & 0x10) != 0) h -= 6;
     result = (l & 0x0f) | (h << 4);
     _c = ((h & 0xff) < 15) ? 1 : 0;
     _z = (result == 0) ? 1 : 0;
     _n = 0;
     _v = 0;
     return (result & 0xff);
   }

We modify the applicable instruction selectors:

       case 0x69:
         adc(arg0);
         if (_d == 1) {
           _a = adcDecimal(arg0);
         } else {
           adc(arg0);
         }
       case 0x65:
       case 0x75:
       case 0x6D:
       case 0x7D:
       case 0x79:
       case 0x61:
       case 0x71:
         adc(memory.getMem(resolvedAddress));
         if (_d == 1) {
           _a = adcDecimal(memory.getMem(resolvedAddress));
         } else {
           adc(memory.getMem(resolvedAddress));
         }
         
      case 0xE9:
         sbc(arg0);
         if (_d == 1) {
           _a = sbcDecimal(arg0);
         } else {
           sbc(arg0);
         }
       case 0xE5:
       case 0xF5:
       case 0xED:
       case 0xFD:
       case 0xF9:
       case 0xE1:
       case 0xF1:
         sbc(memory.getMem(resolvedAddress));
         if (_d == 1) {
           _a = sbcDecimal(memory.getMem(resolvedAddress));
         } else {
           sbc(memory.getMem(resolvedAddress));
         }

Test Results

With everything fixed, we can see if all the tests passed.

The Test Suite runs for about two minutes on my emulator. After the two minutes, when hitting stop, the register window will look as follows:


From this point the program counter remains at 3469. Lets have a look at the assembly listing to see what is at this address:

So, this is confirmation that our emulator passed all the tests!

In Summary

In this post we confirm that we implemented all the CPU instructions correctly in our emulator, using Klaus Dormann's Test Suite.

Here is a link to the tag of this post's source code: https://github.com/ovalcode/c64_flutter/tree/c64_flutter_part11

In the next post we will start writing some more code to boot the C64 ROM's.

Until next time!

Saturday, 1 March 2025

A Commodore 64 Emulator in Flutter: Part 10

Foreword

In the previous post we implemented the last couple of 6502 instructions in our C64 Flutter emulator.

In this post we will be running the Klaus Dormann Test Suite on our emulator to ensure we have implemented all the instructions correctly.

Starting up the Klaus Dormann Test Suite

Let us see if we can startup the Klaus Dormann Test Suite on our emulator, although only in a single stepping fashion at the moment.

To get started, we need two files from Klaus' Github repository:

The first link is the actually binary which will execute in our emulator. This is a 64KB binary which will fill the whole address space accessible by the 6502.

The second file is a listing file, containing the actual disassembled version of the binary we are running. The listing file is useful if you want to follow along to see what the program is actually doing in a certain point in time.

Firstly we dump the binary in the assets folder of our Flutter project and rename it to program.bin. This is the default binary our emulator looks for when it starts up.

Now, usually if a 6502 system starts up, it looks at the reset vector at address 0xFFFC and 0xFFFD for the starting address for which it should start executing code, something which we didn't implemented yet.

In the Klaus Test suite there is also a reset vector defined, but within the context of the Test Suite it has the function to detect if an accidental reset was triggered. So, in actual fact this Test Suite doesn't use the reset vector to everything. Rather, when using the test suite, you should just set the PC register to 0x400 and start execution. This makes our life easier, and for the moment we don't need to worry about implementing the Reset vector stuff.

So, in put cpu.dart, the following change needs to be done, change in bold:

 
...
  int _n = 0, _z = 0, _c = 0, _i = 0, _d = 0, _v = 0;
  int _sp = 0xff;
  int pc = 0x400;
...
With this we can startup our emulator and single step through the code of Test Suite.

Unattended running

To single step through the Klaus Dormann Test Suite in our emulator will be such a daunting tasks. You will probably need to click the step button thousands of times.

It would make our lives easier if we could just let the Test Suite run unattended, with us just pausing the execution once in while, to see how far we have progressed through the tests.

We do this by adding a button right next to the title. As part of the process we need to wrap both the title and the button in a row in order for everything to align properly. All this is happening in the main.dart file:

        appBar: AppBar(
          title:  Row(
              mainAxisSize: MainAxisSize.min,
              children: [
                const Text("Emulator C64"),
                BlocBuilder<C64Bloc, C64State>(
                  builder: (BuildContext context, state) {
                    return Row(
                        mainAxisAlignment: MainAxisAlignment.end,
                        children: [
                          _getRunStopButton(state, context)
                        ]);
                  })
              ]),
        ),
We want our run button to behave like a toggle switch, toggling between a play and a pause button. To do all these fancy stuff, we need to inject some state, which we achieve by wrapping everything with a BlocBuilder. We did discuss the workings of BlocBuilder in a previous post.

Now, the method _getRunStopButton() returns for us three possible buttons, depending on the state, which could be a play button, a stop button, and a disabled play button if everything hasn't initialised yet:

  Widget _getRunStopButton(C64State state, BuildContext context) {
    if(state is DataShowState) {
      return IconButton(
        icon: Icon(Icons.play_arrow),
        onPressed:  () {
          context.read<C64Bloc>().add(RunEvent());
        }
      );
    } else if (state is RunningState) {
      return IconButton(
          icon: Icon(Icons.stop_circle),
          onPressed:  () {
            context.read<C64Bloc>().add(StopEvent());
          }
      );
    } else {
      return const IconButton(
          icon: Icon(Icons.play_arrow),
          onPressed:  null
      );
    }
  }

Here we test for different states. Firstly we show an enabled play button if we are in DataShowState. As you might remember from previous posts, with DataShowState, we display a dump of memory and registers, and we can single step from that point. This is the perfect scenario to provide a play button that will run the emulator at full speed.

Pressing the play button emits a RunEvent, which we still need to implement a listener for. We will do that in a bit.

Secondly if our emulator is in the RunningState, we display the stop button. We should still implement the RunningState State, which in actual fact is a very simple implementation:

class RunningState extends C64State {}
No values or properties we need to convey to here, just conveying the mere fact when we are in the running state.

Finally, for any other state we just want to show a play button that is disabled. This will only happen when our Application is loading up and loading the memory image, which is our case is the Test Suite.

Now, we have defined a number of events that we need to listen for in c64_bloc.dart.

Firstly, let us define the listener for RunEvent. This will be the core of our unattended running. Here we want schedule a timer that runs every second and then we also execute a second worth of CPU instructions (aka 1 000 000 CPU cycles). We need to emit a RunningState state so our front end can update accordingly.

Let us start with an outline:

...
  Timer? timer;
...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
...
      });
      emit(RunningState());
    });
...
We define the timer variable as a global variable in our C64Bloc class, since we want to be able to cancel the timer in another event handler.

Now, to determine when our CPU has executed 1 million cycles worth of instructions, our CPU needs to keep record of the cycles for each of the instructions it executes. This is obviously in the step method:

...
  int _cycles = 0;
...
  int getCycles() {
    return _cycles;
  }
...
  step() {
...
    _cycles = _cycles + CpuTables.instructionCycles[opCode];
    var resolvedAddress =
        calculateEffectiveAddress(CpuTables.addressModes[opCode], arg0, arg1);
    switch (opCode) {
...
    }
  }
We have defined the instructionCycles array in a previous post, which specify the number of cycles for every opcode. So, with every step we can just add the number of cycles for the opcode being executed to a _cyles variable.

With this implemented, we can add some meat to our timer callback function:

...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
          int targetCycles = _cpu.getCycles() + 1000000;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
      });
      emit(RunningState());
    });
...
So, we just add one million to our current Cpu cycle count and that will be the target at which we will stop the loop.

Finally, we need to implement the stop event:

    on<StopEvent>((event, emit) {
      timer?.cancel();
      emit(DataShowState(
          dumpNo: dumpNo++,
          memorySnippet: ByteData.sublistView(memory.getDebugSnippet(), 0, 512),
          a: _cpu.getAcc(),
          x: _cpu.getX(),
          y: _cpu.getY(),
          n: _cpu.getN() == 1,
          z: _cpu.getZ() == 1,
          c: _cpu.getC() == 1,
          i: _cpu.getI() == 1,
          d: _cpu.getD() == 1,
          v: _cpu.getV() == 1,

          pc: _cpu.pc));
    });

Here we just cancel the timer emit a DataShowState, so after we have stopped the running, we want to display the current state of memory and the registers.

When the emulator runs unattended, we also want to hide the state display to avoid confusion and just show "running". To keep the discussion focused, I will not be going into this detail.

Running the Test Suite

Finally we are at a point where we can run Klaus Dormann's Test Suite. On startup, the screen look like this:

As dicussed, the play button to start the emulator in unattended mode is next to the title.

When clicking play, the screen changes like this:

One weird thing you might notice, is if you click run and quickly stop again, you will see the Program counter is still at 0x400, the starting address of the test suite. As if nothing executed. The reason for this is very subtle. Our timer callback will only execute if the timer lapsed. So, in our case we need to wait at least 1 second to expect some results before clicking the stop button.

So, if we let it run a bit longer, our result will look like this:


So, when stopped our Program counter was at 0x9D7. Funny thing is, you can let it run for long as you want to, but the Program counter remains stuck at 0x9D7.

What is going on here?

To find the answer we need to look at the source listing of the Test Suit and search for that address:


Here it is clear, if something went wrong with the test, it will do an endless loop at the address 09D7. So, obviously, our emulator failed test, but which one? Look back a couple of lines, we see the comment: The IRQ vector was never executed

Aha! We never implemented IRQ's (Interrupt Requests) in our emulator. Having said that, it briefly caught me in a mystical moment, almost like as a kid and playing on a Commodore 64, I wondered for the first time what was going on underneath the hood.

In this case I wondered where the IRQ came from. This Test Suite doesn't implement any magical peripherals? After a moment I realise that this was probably caused by me not implementing the BRK instruction, and looking further back in the listing did confirm this.

This was actually a very interesting experience for me. It was the first time I encountered a problem, and my first instinct is moment of nostalgia 😂

In the following section we will implement the BRK instruction and then run the emulator again.

Implementing the BRK and RTI instructions

So, let us quickly implement the BRK and RTI instructions. There is one caveat with the BRK instruction. It is a one byte instruction, but in actual fact it behaves like a 2 byte instruction. The BRK triggers an IRQ and when it returns it doesnt return to the address directly after the BRK instruction, but one address further on.

To account for this quirk of the BRK instruction, we can adjust the instruction length in the instructionLen table for the BRK instruction to 2.

With the table adjusted, we implement the BRK and RTI instruction as follows:

      /*BRK*/
      case 0x00:
        push(pc >> 8);
        push(pc & 0xff);
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
        _i = 1;
        pc = (memory.getMem(0xffff) << 8) | memory.getMem(0xfffe);

      /*RTI*/
      case 0x40:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;
        pc = pull() | (pull() << 8);
Now, when we run the test suite again, we get passed this failed test suite. However, we end up in another endless loop at address 0xdeb, which indicates another failed test.

We will investigate this failed case, as well as other potential failed cases in the next post.

In Summary

In this post we ran the Klaus Dormann Test Suite on our Emulator in unattended mode. The first failed test case we encountered was the BRK/RTI instruction that wasn't implemented.

With the BRK/RTI instruction implemented we encountered another failed test case which we will investigate in the next post, as well as other potential failed test cases which will pop up.

You can find all the source code for this project as well as the binary image containing the Klaus Dormann test suite, here.

Until next time!

Sunday, 23 February 2025

A Commodore 64 Emulator in Flutter: Part 9

Foreword

In the previous post we implemented all stack operations for our Flutter C64 emulator. This included pushing and popping the Accumulator and the status register. Also, we implemented the JSR/RTS instructions, which also operates on the stack.

In this post we will be implementing the remaining instructions of the 6502, which includes the following:

  • BIT
  • JMP (Jump)
  • NOP
  • Register operations
With these instructions implemented, we can start in the next post to run the Klaus Dormann Test suite on our emulator to see if we have implemented all the 6502 instructions correctly.

Enjoy!

The Jump instruction

Implementing the jump instruction is just a straight forward operation of loading the program counter with a new value. Let us add the selectors for these:

        /*
JMP (JuMP)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Absolute      JMP $5597     $4C  3   3
Indirect      JMP ($5597)   $6C  3   5
         */
      case 0x4C:
      case 0x6C:
        pc = resolvedAddress;

Now, there is two address modes for this instruction: Absolute and Indirect. The absolute address mode we have already implimented in the calculateEffectiveAddress() method, but not the Indirect Address mode. So, within the calculateEffectiveAddress() method, let us add the following selector:

      case AddressMode.indirect:
        var lookupAddress = (operand2 << 8) | operand1;
        return memory.getMem(lookupAddress) | (memory.getMem(lookupAddress + 1) << 8);

The BIT instruction

Next, let us implement the BIT instruction. From the specs, the BIT instruction is defined as follows:

BIT (test BITs)
Affects Flags: N V Z

MODE           SYNTAX       HEX LEN TIM
Zero Page     BIT $44       $24  2   3
Absolute      BIT $4400     $2C  3   4

BIT sets the Z flag as though the value in the address tested were ANDed with the 
accumulator. The N and V flags are set to match bits 7 and 6 respectively in the 
value stored at the tested address.
We implement this as follows:

      case 0x24:
      case 0x2C:
        int memByte = memory.getMem(resolvedAddress);
        _z = ((memByte & _a) == 0) ? 1 : 0;
        _n = ((memByte & 0x80) != 0) ? 1 :0;
        _v = ((memByte & 0x40) != 0) ? 1 :0;

The NOP instruction

The NOP instruction is the short for No Operation. It literally does nothing except for consuming CPU cycles. One of the major uses of this instruction is to reserve some slots in memory where in future you might want to add some more instructions.

Strictly speaking you don't need to implement a case selector for this instruction in our big switch statement decoding the different op opcodes. The surrounding mechanism should just skip to the next instruction.

However, by not implementing a selector for NOP, the default selector will be invoked in the switch statement. The default selector is nice to warn us if we forgot to implement some instructions or we encountered some undocumented instructions in the code. By not giving NOP a selector we will get many false positives by hitting the default selector.

So, the selector for NOP will look as follows:

        /*
NOP (No OPeration)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Implied       NOP           $EA  1   2
         */
      case 0xEA:
        break;
With the Dart language we don't need the break in general. However, when you have blank case like this you will need to add it, otherwise it will fall through to the next case statement with code, which is not what we want.

Register operations

Finally, let us implement the register operations. As per the specs, these are the following Instructions:

Register Instructions
Affect Flags: N Z

These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                 HEX
TAX (Transfer A to X)    $AA
TXA (Transfer X to A)    $8A
DEX (DEcrement X)        $CA
INX (INcrement X)        $E8
TAY (Transfer A to Y)    $A8
TYA (Transfer Y to A)    $98
DEY (DEcrement Y)        $88
INY (INcrement Y)        $C8
In previous posts we did implement some of these. Doing some inventory, I found that the following still needs to be implemented:
  • TAX
  • TXA
  • INX
  • TAY
  • TYA
  • INY
Here is the implementation:

      case 0xAA:
        _x = _a;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0x8A:
        _a = _x;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xE8:
        _x++;
        _x = _x & 0xff;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0xA8:
        _y = _a;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

      case 0x98:
        _a = _y;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xC8:
        _y++;
        _y = _y & 0xff;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

This covers the instructions we wanted to implement in this post. I am not going to write a test program in this post to test the instructions we have implemented, since in the next post we will start to run the Klaus Dormann Test Suite, which will anyway surface any defects.

In Summary

In this post we implemented the remaining instructions for our emulator.

In the next post we run the Klaus Dormann Test Suite on our Emulator to see if we have some defects in our implementation. This will probably go over to multiple posts depending on how many issues we detect.

Until next time!

Thursday, 20 February 2025

A Commodore 64 Emulator in Flutter: Part 8

Foreword

In the previous we implemented the compare and branch instructions for our Flutter C64 Emulator.

In this post we will implement the 6502 stack and related operations like pushing/ popping, Jump to subroutine and Return from subroutine.

Enjoy!

The stack concept

The stack is a Last in First out (LIFO) data structure. To visualise a stack in real life, one can look at a receipt stack:


Clearly, one can see that the receipt that is most accessible is the last receipt you have placed on top of the pile.

In a CPU the stack has many uses, like if you were calling subroutines in a nested way, and you want to return to the caller of a subroutine. A stack is perfect for this, because you want access to the last return address.

On the 6502, the stack is 256 bytes in size and lives in page 1 of the memory space. That is the address range $100 - $1ff. On the 6502 the stack grows downwards starting at $1ff, growing down towards $100. Obvious as you pop stuff off the stack it goes back towards $1FF.

On the 6502 the stack has many uses, of which we already mentioned jumping and returning from sub routines. You can also push and pop registers. The 6502 also uses the stack when serving interrupts. Before an interrupt routine is called it stores the state of the CPU on the stack, so if the service routine is finished, it restores the CPU to the state before the CPU was interrupted, and the program continues as if nothing has happened.

Creating the stack mechanism

Let us start by writing some code for implementing the stack mechanism. We start by defining a stack pointer:

int _sp = 0xff;
We start with the initial value of 0x1ff, which is the starting poisition of the stack. We omit the high byte value of 1, and will just prepend it if we need to do any lookups in memory.

Now let us create some push and pop instructions.

  push(int value) {
    memory.setMem(value, _sp | 0x100);
    _sp--;
    _sp = _sp & 0xff;
  }

  int pull() {
    _sp++;
    _sp = _sp & 0xff;
    return memory.getMem(_sp | 0x100);
  }

With the understanding that the stackpointer points to the location where the push will happen, we can use the stackpointer address as is when storing the value of the push and then decrement the pointer thereafter.

However, since the pointer points to the next push location, you cannot use the location as is when doing a pull. You first need to increment the pointer and use that value for the read address. 

Before ending this section, let us see if we can implement the basic stack instructions Push accumulator(PHA) and Pull Accumulator(PLA) to see if our stack implementation behaves as expected.

    /*
    PHA (PusH Accumulator)          $48  3
    */
      case 0x48:
        push(_a);
    /*
    PLA (PuLl Accumulator)          $68  4
    */
      case 0x68:
        _a = pull();
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

Implementing JSR and RTS

Let us now implement the JSR (Jump to Subroutine) and RTS (Return from Subroutine) instructions.

So, in principle when the JSR executes, it pushes the address of the next instruction on the stack as the return address before jumping to the subroutine. When the subroutine finishes executing and invoke RTS, it pulls this address again of the stack and jump to it.

However, there is a small caveat with this sequence of events. The return address pushed onto the stack is not exactly the return address of the next instruction, but the address of the next instruction -1.

This way of operation of the JSR, the designers of 6502 implemented as a kind of an optimisation. When reading instructions from memory the program counter is incremented by 1 each time, and by the time it needs to push the return address the PC is still pointing to the last byte of the JSR instruction.

Now, if were to implement the JSR/RTS in your emulator with the assumption that the value pushed on the stack is purely the address of the next instruction, without worrying about the -1 stuff, you emulator would probably work fine 99% of the time. That been said, however, I did encounter some magic 6502 code in the past that interrogate the contents of the stack for implementing stuff like copy protection or auto-starting code. In such cases, your emulator might not work correctly with such code if your emulate the JSR instruction doesn't push adresses on the stack following the -1 convention.

So, it is important to adhere to this convention when implementing the JSR/RTS instructions.

Here is the implementation of these two instructions:

/*
MODE           SYNTAX       HEX LEN TIM
Absolute      JSR $5597     $20  3   6
 */
      case 0x20:
        int temp = (pc - 1) & 0xffff;
        push(temp >> 8);
        push(temp & 0xff);
        pc = resolvedAddress;
/*
MODE           SYNTAX       HEX LEN TIM
Implied       RTS           $60  1   6
 */
      case 0x60:
        pc = pull();
        pc = pc | (pull() << 8);
        pc++;
        pc = pc & 0xffff;

Implementing the other stack operations 

Let us now implement the rest of the stack operations.

The simplest of these operations are the transfer between the Stack Pointer register and the X register, which is TSX and TXS. So let us quickly implement them:

/*
        TXS (Transfer X to Stack ptr)   $9A  2
 */
      case 0x9a:
        _sp = _x;
/*
        TSX (Transfer Stack ptr to X)   $BA  2
 */
      case 0xba:
        _x = _sp;

What remains to be implemented is pushing and pulling the status register, that is the register that contains all the flags, like the Zero Flag, Negative Flag, overflow flag and do so on.

At this point the question arises in which order the flags are stored in the status byte that gets pushed onto the stack. One possibility is deciding on the order of the flags yourself and emulation will probbaly work correctly 99% of the time.

However, as I mentioned in the previous section where we implemented the JSR/RTS instructions, you often 6502 machine language programs that inspect the contents of the stack, so if you decide the order of the flags in the status byte yourself, this code might not work correctly.

The question is: How do we find the correct order of the flags in the status register? In the general the web sites that gives you info on the 6502 instructions, don't provide you with this info on the status register.

After digging a bit on the internet, I found the information via the following link:


They provide a nice diagram for the status register:

Some extra information about the status register is that bit 4 and 5 should be one when pushed on the stack. Similarly, when popping this value back to the status register, we ignore bits 4 and 5. With all this said, let us implement the PHP and PLP instructions:

/*
        PHP (PusH Processor status)     $08  3
 */
      case 0x08:
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
/*
        PLP (PuLl Processor status)     $28  4
 */
      case 0x28:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;

We have implemented all instructions for this post. In the next section we will write a test program for all the instructions we have added.

The Test Program

We will use the following for our test program:

0000 A9 0A LDA #$0a
0002 48    PHA
0003 48    PHA
0004 48    PHA
0005 48    PHA
0006 a2 50 LDX #$50
0008 9a    TXS
0009 48    PHA
000a 48    PHA
000b 48    PHA
000c 48    PHA
000d A9 7F LDA #$7f
000f 69 01 ADC #$01
0011 20 19 00 JSR TEST
0014 68    PLA
0015 68    PLA
0016 68    PLA
0017 68    PLA
0018 68    PLA
0019 08 TEST PHP
001a B8    CLV
001b A9 00  LDA #$00
001d 28     PLP
001e 60     RTS
Here we test a couple of operations of the stack. Pushing and pulling elements from the stack, changing the stack pointer, doing a JSR/RTS and pushing and pulling the Status register.

Currently within our emulator, we only have a view of the first page of memory (e.g. bytes 0 to 255). However, when executing the above program it would be nice to extend the view so we can see what is happening on the stack as well. I have made the change and it look like this:


I am not going to cover the changes required to adjust the view like this, but it is available in a git tag I have created here. This tag also contains the test program for this post as binary which will execute as you click the step button.

Lets see how the stack changes as we execute the program. We start by pushing the Accumulator a number of times to the stack. We can see our values towards the end of page 1:

We then change the stack pointer to 0x50 and do a couple of pushes again of the Accumulator. We can now see the contents pushed is now in a different aread in memory:

Next, we force the Overflag flag to be set by doing an addition that causes an overflow after which we push the status register. With the overflow operation we just mange to set as much flags as possible. We then jump to a sub routine which pushes some stuff on the stack.

At this point, our memory dump will look like this:

The return address pushed is 0013. As mentioned in a previous section the return address pushed is always one less than the actual address, because of the design of the 6502.

The value pushed for the Status Register is F0 (e.g. the upper 4 bits set). As mentioned previously, bits 4 and 5 are always set, and because of the operations we did, the overflow flag is set as well as the negative flag. 

We then clear the negative and overflow flag on purpose to see if the PLP instruction at the end of the subroutine restore them for us.

We then correctly return from the subroutine continuing execution at address 0014. We then do a number of pulls to our accumulator to see if we get back the same values that originally pushed. By purpose I have added an extra PLA afterwards to see what it does. And as expected, we get a 00 because that it after the last value.

This concludes what we want o achieve in this post

In Summary

In this post we implemented all stack operations, including push and pull the accumulator and the Status register. We also implemented the JSR/RTS instructions, which also relies on the stack.

We are just about finished with implementing all instructions for the 6502. What remains are the following:

  • BIT
  • JMP (Jump)
  • NOP
  • Implied register operations
So, in the next post I will be implementing these.

With the above implemented, we can move onto more interesting things, like running the Klaus Dormann Test Suite on our emulator to see if it behaves like a real 6502. This is very important, because it will help us to emulate a game as accurately as possible.

Until next time!