Foreword

In the previous post we managed to interface the keyboard to our C64 Flutter emulator. With that implemented, we were able to enter a simple Basic program into our emulator and running it.

Now, my ultimate goal for writing this emulator, is to be able to run the game Dan Dare in our emulator, loading it from a tape image.

So, to achieve this end goal, the next goal would be for our emulator to be able to load a tape image. On a C64, loading from the tape rely heavily on the features of a CIA (Complex Interface Chip). The features tape loading rely on is connecting access the read head from the tape, timers and interrupts.

Up to now we have been mimicking some of the features of a CIA chip. The address range of the CIA chip is within DC00-DCFF. It immediately comes to mind that in the previous post we implemented two of the registers of the CIA, DC00 and DC01 for keyboard access.

We also implicitly implemented a timer and interrupts in our emulator, interrupting the CPU every 1/60 of a second, so that the cursor can flash and keyboard entry could work. However, we blindly forced these interrupts just as a quick hack just to get the cursor and keyboard to work. We didn't even consider the values set in the CIA for setting the timer.

However, to implement tape loading we would not be able to get away with a quick hack 😀 We will need to emulate the CIA properly for this purpose.

So, in this post we will implement CIA emulation bit by bit. This will include revisiting our current keyboard and timer interrupt implementation (e.g. doing the 1/60 second interrupt), and implementing it properly with CIA implementation.

We will probably only get to tape emulation in the next post.

Enjoy!

Creating the CIA skeleton

Lets begin our journey by creating a CIA class just as a skeleton. This class will evolve over time to contain all the functionality that a CIA will contain:

class Cia1 {
  setMem(int address, int value) {
    print("setMem ${address.toRadixString(16)} ${value.toRadixString(16)}");
  }

  int getMem(int address) {
    print("getMem ${address.toRadixString(16)}");
    return 0;
  }
}

Here we do something interesting. Every write or read from the CIA address range we log. With this we can see which functionality is used and we can just implement the bare minimum functionality of the CIA chip.

With this exercise we also want to disable the hard coded interrupts happening every 1/60 second to avoid any potential side-effect with our CIA journey:

  step() {
/*
    if ((_cycles > 1000000) &&((_cycles % 16666) < 30) && (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n << 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
*/
    var opCode = memory.getMem(pc);
    pc++;
    var insLen = CpuTables.instructionLen[opCode];
    ...
  }

Next we need to make an instance of this class and inject into our Memory class:

  C64Bloc() : super(InitialState()) {
    memory.setKeyInfo(this);
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      Cia1 cia1 = Cia1();
      memory.setCia1(cia1);
      ...
    }
    ...
  }

We modify the actual Memory class like this:

class Memory {
...
  late final Cia1 cia1;
...
  setCia1(Cia1 cia1) {
    this.cia1 = cia1;
  }
...
  setMem(int value, int address ) {
    if ((address >> 8) == 0xDC) {
      cia1.setMem(address, value);
    } else {
      _ram.setInt8(address, value);
    }
  }

  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else if ((address >> 8) == 0xDC ) {
      return cia1.getMem(address);

    /*else if (address == 0xDC01) {
      return keyInfo.getKeyInfo(_ram.getUint8(0xDC00));*/
    } else {
      return _ram.getUint8(address);
    }
  }
}

So, every time an address starts with DC we send this access to the CIA instance. You will also see that I have commented out the explicit access to the DC01 register, which we added in the previous post for keyboard access. We will implement this functionality at a later stage into our CIA class.

Now, let us start the emulator and watch the log output:

setMem dc0d 7f
setMem dc00 7f
setMem dc0e 8
setMem dc0f 8
setMem dc03 0
setMem dc02 ff
setMem dc04 95
setMem dc05 42
setMem dc0d 81
getMem dc0e
setMem dc0e 11
setMem dc04 25
setMem dc05 40
setMem dc0d 81
getMem dc0e
setMem dc0e 11

So, let us quickly see what is going on here. With the write to DC0D, we disable all interupts going to the CPU.

The write to address DC00 is for the keyboard stuff, which we don't worry about at the moment.

Next we see the value 8 written to registers DC0E and DC0F. This puts timers A and B in One shot mode.

Lets skip a couple of memory writes and get to the writing to locations dc04 and dc05. These are registers for setting the duration of timerA. Each count is a count of your 1MHz clock which also drives the 6510. Dc04 is the lo byte of the value and DC05 the high byte. So, this is 4295 hexadecimal which translates to 17045, which is close to that hard coded count we used previously for triggering an interrupt every 1/60 second.

Next, we do an assignment to location DC0D. This is the interrupt register. We see in the value assigned that the least significant bit is set. This is the value that controls interrupts from Timer A. We also see that the most significant bit is set in the assigned value. If this value is a 1 it means enable all interrupts that is a one is this value byte. So, in this case we have enabled interrupts from timer A.

Finally, we see a value that is assigned twice to register DC0E. With the assignment, two things are happening. Firstly bit 4 is set, which means force load the value from the timer from the latch, which in our case would be the hex value 4295. The second thing that happens with bit 0 that is set, is that timer A is finally started.

Something else is also happening subtly. Previous I mentioned we are setting bit 3 to 1, meaning it was in one shot mode. Now, however, we are setting this bit to a zero, which means that timer A will operate in continuous mode. This means after the timer has lapse it will automatically restart, which means we will get periodic interrupts every 1/60 second.

Implementing the Alarm System

With the skeleton implemented for the CIA chip, we should start implementing some meat for it. We will start with timer A.

Now, timer A is very reliant on the number of cycles the CPU executed. There are other operations that is also dependant on the number of CPU cycles executed, like tape loading, drawing pixels at the right moment on the screen and SID sound generation.

I wrote a number of C64 emulators for other programming languages. I must admit, for all these emulators, I would would do all these operations that is dependant on CPU cycles executed, on every CPU instruction executed. In the beginning, when I just add timers or tape interrupts, I didn't really see issues.

However, as I added more of these operations dependant on CPU cycles executed, I saw performance gradually worsening, especially when I added more of the VIC-II operations.

Now, what I experienced isn't really something new. There is actually a computer science term for this trying to solve the issue, which is Loop fission. The following Wikipedia article explains a bit more about Loop fission:

https://en.wikipedia.org/wiki/Loop_fission_and_fusion

Basically, when you have a loop where you do a lot of things in a loop iteration, one issue that pops up is that you have more cache misses, and your CPU needs to fetch data from slower RAM more often. By splitting the loop into more separate loops cache misses should be reduced and therefore improve performance.

I have digged a bit into the source code of the Vice Emulator and overall they also overall try to break things into separate loops. They have the whole concept of alarms. For instance everything VIC-II scan line is 63 cycles. So, instead of rendering a bit of a line after CPU instruction, they set an alarm that will trigger 63 cycles into the future. So, with every CPU instruction execution, it will check if 63 cycles has passed. Only when the 63 cycles has passed, then you execute an alarm handler that will render the full line.

Of course, during the course of the 63 cycles, something might change like the border color, in which the line will not only show one border color. In such cases when writing to such a register, one should keep record when the color change.

Lets start to create a Alarm subsystem for our emulator. We start with a brief outline:

class Alarms {
  final LinkedList<Alarm> _alarmList = LinkedList<Alarm>();

  Alarms();

  Alarm addAlarm(Function(int remainder) callback) {
    var alarm = Alarm._(this, callback);
    _alarmList.add(alarm);
    return alarm;
  }

}

So, here we have a class containing all our alarms. Internally all the alarms is store in a linked list, which is a data structure in Dart. We will visit this in a while.

There is also a method for adding a alarm with a callback, so when the alarm has expired you can call the callback to do some stuff. The remainder parameter indicates how much cycles we have gone over the alarm threshold when a cpu instruction has executed.

Lets now focus a bit on the LinkedList story. So, we have a declaration LinkedList<Alarm>(). LinkList is one of Flutter's build in classes which is a generic, which you need to type when you make an instance. In this case we are saying we will have a LinkedList containing instances of Alarm.

Now usually with generics, You can define Alarm in anyway you want. However, with a LinkedList, things are a bit more tricky, because every node needs to point to the next and previous node. This is just how a LinkedList is implemented.

Luckily you don't need to worry about implementing all this yourself. You can just let our Alarm class extends LinkedListEntry, then all this will happen automatically:

final class Alarm extends LinkedListEntry<Alarm> {
  late final Alarms _alarms;
  late final Function(int remainder) _callback;

  Alarm._(Alarms alarms, Function(int remainder) callback ) {
    _alarms = alarms;
    _callback = callback;
  }

}

Let us now add some more meat to our alarm class:

final class Alarm extends LinkedListEntry<Alarm> {
  var _targetClock = 0;
...
  setTicks(int ticks) {
    _targetClock = _alarms.getCurrentCpuCount() + ticks;
  }

  getRemainingTicks() {
    return _targetClock - _alarms.getCurrentCpuCount();
  }

  getTargetClock() {
    return _targetClock;
  }

  processAlarm(int remainder) {
    _callback(remainder);
  }
}

Basically I have added some methods for keeping track of how far we are from triggering a alarm. The processAlarm will be invoked when the alarm is triggered.

Now, let us add some meat to our Alarms class:

class Alarms {
  final LinkedList<Alarm> _alarmList = LinkedList<Alarm>();
  int _cpuCount = 0;

  Alarms();

  Alarm addAlarm(Function(int remainder) callback) {
    var alarm = Alarm._(this, callback);
    _alarmList.add(alarm);
    return alarm;
  }

  reAddAlarm(Alarm alarm) {
    _alarmList.add(alarm);
  }

  int getCurrentCpuCount() {
    return _cpuCount;
  }

  processAlarms(int cpuCycles) {
    _cpuCount = cpuCycles;
    for (Alarm item in _alarmList) {
      if (item.getRemainingTicks() <= 0) {
        item.processAlarm(item.getRemainingTicks());
      }
    }
  }
}

The key method added here is processAlarms(). This method loops through the alarms, checking which expired and then calling its callback.

Another interesting method is reAddAlarm(). It will happen often that we will stop a timer, at which we will remove it from the alarms queue, so it isn't triggered again. However, there might be a case where we want to start the timer again, at which we will use reAddAlarm(), to add it back to the queue so it is evaluated again for expiry.

Wiring everything together

With all the building blocks created in the previous section, lets now put them together. In C64Bloc let us do some initialisation:

class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
  final Memory memory = Memory();
  final List<int> matrix = [0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff];
  final FocusNode focusNode = FocusNode();
  late final Cpu _cpu = Cpu(memory: memory);
  late final Alarms alarms = Alarms();
  type_data.ByteData image = type_data.ByteData(200*200*4);
  int dumpNo = 0;
  int frameNo = 0;
  Timer? timer;
...
  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      Cia1 cia1 = Cia1(alarms: alarms);
      cia1.setKeyInfo(this);
      memory.setCia1(cia1);
      memory.populateMem(basicData, characterData, kernalData);
      _cpu.setInterruptCallback(() => cia1.hasInterrupts());
...
  }
...
}

I have added a field for our alarms. I am also now injecting an Cia1 instance into our memory.

In our CPU class we also now use a InterruptCallBack, which our CPU class will call to see if any interrupts has occured. Our Cia1 instance will provide this info.

In our main event processing loop, we also make a small change:

    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int start = DateTime.now().millisecondsSinceEpoch;
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
            alarms.processAlarms(_cpu.getCycles());
          } while (_cpu.getCycles() < targetCycles);
...
});
    });

After every CPU we process the alarms with the current cpu cycles.

Expanding the CIA1 class

Earlier we created a sceleton for the CIA1 class. We will now expand this class further.

As usual we start with some initialisation:

class Cia1 {
  int timerAlatchLow = 0xff;
  int timerAlatchHigh = 0xff;
  int timerAvalue = 0xffff;
  Alarms alarms;
  Alarm? timerAalarm;
  bool timerAstarted = false;
  bool timerAoneshot = false;
  int registerE = 0;
  int register0 = 0;
  bool timerAinterruptEnabled = false;
  bool timerAintOccurred = false;
  late final KeyInfo keyInfo;


  Cia1({required this.alarms});

  setKeyInfo(KeyInfo keyInfo) {
    this.keyInfo = keyInfo;
  }
 ...
}

The meaning of these private variables will became clear in a bit.

Next, let us implement the following method:

  updateTimerA() {
    if (!timerAstarted) {
       return;
    }
    if (timerAalarm != null) {
      timerAvalue = timerAalarm!.getRemainingTicks();
    }

  }

timerAValue is the value of count down timerA in the CIA. To increase locality, we dont update this value with the execution of every CPU instruction. Instead, we wrote this method that updates the value when the CPU reads the value of this register.

Next we add these methods:

  hasInterrupts() {
    if (timerAintOccurred && timerAinterruptEnabled) {
      return true;
    } else {
      return false;
    }
  }

  processTimerAalarm(int remaining) {
    // Do interrupt
    timerAintOccurred = true;
    if (timerAoneshot) {
      timerAalarm?.unlink();
      timerAstarted = false;
      return;
    }
    timerAalarm!.setTicks((timerAlatchLow | (timerAlatchHigh << 8)) + remaining);
  }

Here we deal with when the timer expire and we set interrupts. We remove the timer from the alarm list if it is oneshot. Otherwise we schedule the running of the timer again.

Finally, let us add methods for reading and writing to the CIA registers:

  setMem(int address, int value) {
    print("setMem ${address.toRadixString(16)} ${value.toRadixString(16)}");
    value = value & 0xff;
    address = address & 0xf;
    switch (address) {
      case 0x0:
        register0 = value;
      case 0x4:
        timerAlatchLow = value;
      case 0x5:
        timerAlatchHigh = value;
      case 0xD:
        if ((value & 0x80) != 0) {
          timerAinterruptEnabled = ((value & 1) == 1) ? true : timerAinterruptEnabled;
        } else {
          timerAinterruptEnabled = ((value & 1) == 1) ? false : timerAinterruptEnabled;
        }
      case 0xE:
        var startTimerA = ((value & 1) == 1) ? true : false;
        var forceTimerA = ((value & 16) != 0) ? true : false;
        updateTimerA();
        if (forceTimerA) {
          timerAvalue = timerAlatchLow | (timerAlatchHigh << 8);
        }
        var startingTimerA = startTimerA & !timerAstarted;
        var stoppingTimerA = !startTimerA & timerAstarted;
        var alreadyRunningTimerA = startTimerA && timerAstarted;
        if (startingTimerA || (alreadyRunningTimerA && forceTimerA)) {
          // schedule timer on alarm
          timerAalarm ??= alarms.addAlarm( (remaining) => processTimerAalarm(remaining));
          if (timerAalarm!.list == null) {
            alarms.reAddAlarm(timerAalarm!);
          }
          timerAalarm!.setTicks(timerAvalue);
          // set timer as started
        } else if (stoppingTimerA) {
          //unschedule timer A
          timerAalarm!.unlink();
        }
        timerAoneshot = (value & 8) != 0;
        timerAstarted = startTimerA;
        registerE = value;
      default:
        // throw "Not implemented";
    }

  }

  int getMem(int address) {
    print("getMem ${address.toRadixString(16)}");
    updateTimerA();
    address = address & 0xf;
    switch (address) {
      case 0x0:
        return register0;
      case 0x1:
        return keyInfo.getKeyInfo(register0);
      case 0x4:
        return timerAvalue & 0xff;
      case 0x5:
        return timerAvalue >> 8;
      case 0xD:
        if (timerAintOccurred) {
          timerAintOccurred = false;
          return 0x81;
        } else {
          return 0;
        }
      case 0xE:
        var result = registerE & 0x06;
        result = result | (timerAstarted ? 1 : 0);
        result = result | (timerAoneshot ? 8 : 0);
        return result;
    }
    return 255;
  }

You will see each time we read from the CIA1 we update the timer. In the write function we also adjust the alarms accordingly if we chane the state of the times.

Changes to the CPU class

There is finally just a small change we need to do to our CPU. Previously in our CPU we hardwired an interrupt that happened every 1/60th of a second. However, now we have implemented an CIA class, we need to change how interrupts works.

Here is the highlighted changes:

class Cpu {
...
  late final Function() _interruptCallback;
...
  setInterruptCallback(Function() callback) {
    _interruptCallback = callback;
  }
...
  step() {
    if (_interruptCallback() & (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n <<< 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
...
  }
...
}

Now, we call the interruptCallBack, which basically tie back to the CIA1 class we created. Also, we only invoke an interrupt only when the Inteerupt disable flag is not set.

In Summary

In this post we introduced the CIA as a separate class. We also removed the hardcoded mechanism which trigger an interrupt every 1/60th of a second, and rather let the CIA schedule the interrupts as programmed by machine language.

In the next post we will start to implement tape loading from a raw tape image.

Until next time!

C64 on an FPGA

Wednesday, 3 December 2025

A Commodore 64 Emulator in Flutter: Part 14