Thursday, 22 May 2025

A Commodore 64 Emulator in Flutter: Part 13

Foreword

In the previous we managed to boot the C64 system with a screen showing the contents of screen memory in real time. It booted with the welcome message and a flashing cursor.

In this post we will provide some keyboard interfacing with our C64 emulator. We will approach this in a very experimental fashion, exploring how Flutter itself work with keyboard interfacing in a app. Then we will try to see if we can get keyboard interfacing to work in our app, and finally see if our emulator can work with the keyboard.

Enjoy!

KeyboardListener in Flutter

What we want for our emulator is basically to tell when a key is held down, and when it is released. Flutter provides this for us via a KeyboardListener. From the Flutter documentation it is not so straightforward on how to use this, so I looked around for a worked example on the Internet and found the following:

https://medium.com/@wartelski/how-to-flutter-keyboard-events-keyboard-listener-in-flutter-web-0c36ab9654a9

The following snippet is the core of the example:

With this example we can basically catch it when a key is down. Now all is well in this example, except for we have a final variable for _focusNode. This is, however, only a thing we can do with a StatefulWidget. In our case, however, we are within a StatelessWidget, where we cannot do such things.

In our case we would place the focusNode in our Bloc. Probably not the best place if one think about separation of concerns, but for now it is the best place if we want to keep a single instance of FocusNode alive. So, we do the following changes:

class C64Bloc extends Bloc<C64Event, C64State> {
  final Memory memory = Memory();
  final FocusNode focusNode = FocusNode();
...
}

And now we go further and wrap our RawImage in a KeyboardListener:

...
           } else if (state is RunningState) {
              return KeyboardListener(
                focusNode: context.read<C64Bloc>().focusNode,
                autofocus: true,
                onKeyEvent: (event) => {
                  if (event is KeyDownEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      print("The m key is pressed!!")
                    }
                  } else if (event is KeyUpEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      print("The m key is released!!")
                    }
                  }

                },
                child: RawImage(
                    image: state.image, scale: 0.5),
              );
            } else {
...

So, here we listen for the "M" key and write out to the console when this key is pressed and released.

Simulating a key press in our emulator

Next, let us see we can simulate a key press in our emulator. To figure out how let us dig a bit into how the keyboard is implemente in hardware.

Firstly, a keyboard is arranged a matrix of rows and columns, and where a row and column meets, there is a key switch. If the switch is pushed, it will short the row to ground. To see if a switch is pressed is a two step process. You need to energised each column in turn and see which columns are shorted to ground.

Firstly, to get an idea how the matrix of a C64 is arranged, the following diagram is helpful:

Now, the big question is which memory locations do we need to manipulate and read to see which key was pressed.

The following web link provide us with a memory map which will aid in finding these memory locations:

https://sta.c64.org/cbm64mem.html

Scrolling down, we eventually find the place where it is dealt with the keyboard:

As you can see, both these ports is used by the joystick ports and the keyboard. The first piece of info that is useful for us, is the following at memory location DC00:

Bit #x: 0 = Select keyboard matrix column #x.

So, this is actually where we energise one or more columns. In the matrix diagram, this is actually the parts labeled A - H. Each of these are assigned a bit number (0 - 7) in the byte we write to this port.

The next piece of useful info is at memory location DC01:

Bit #x: 0 = A key is currently being pressed in keyboard matrix row #x, in the column selected at memory address $DC00.

So, we select one or more columns in location DC00 and within the selected column, we can read via location DC01 which rows in that column is selected.

Let us now see how we can emulate a keypress in our emulator. At this point we are able to catch keys from the keyboard with a KeyboardListener. In our KeyboardListener we can basically trigger events for which we listen for in our Bloc.

First let us define a event class which we will trigger:

class KeyC64Event extends C64Event {
  final bool keyDown;
  KeyC64Event({required this.keyDown});
}

So, we will either trigger an event with keyDown = true, when a key is pressed, or an event with keyDown = false, when a key is released.

With this in mind, let us modify our KeyboardListener:

            } else if (state is RunningState) {
              return KeyboardListener(
                focusNode: context.read<C64Bloc>().focusNode,
                autofocus: true,
                onKeyEvent: (event) => {
                  if (event is KeyDownEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      context.read<C64Bloc>().add(KeyC64Event(keyDown: true))
                    }
                  } else if (event is KeyUpEvent) {
                    if (event.logicalKey == LogicalKeyboardKey.keyM) {
                      context.read<C64Bloc>().add(KeyC64Event(keyDown: false))
                    }
                  }

                },
                child: RawImage(
                    image: state.image, scale: 0.5),
              );
            } else {

Next, let us listen for these events in our Bloc:

class C64Bloc extends Bloc<C64Event, C64State> {
...
  bool keyDown = false;
...
  C64Bloc() : super(InitialState()) {
...
    on<KeyC64Event>((event, emit) {
      keyDown = event.keyDown;
    });
...
  }
...
}

So, within our Bloc, keyDown is a variable keeping track of whether the key is up or down, which in this case is the state of the M key on our keyboard. We will make use of this variable to simulate a key stroke in our emulator.

Now, the action simulation of a key press should happen in our Memory class when a read is done from address DC01, we should consider which column is enable via address DC00, and see if in the column enabled, that there is indeed one of the keys held down and send back a value that reflects this.

So, we have a situation here where Memory wants some info from our Bloc class in which it lives, but we dont want to provide Memory for with all the state of the Bloc class. To achieve this we need to create an interface with methods returning the info the Memory needs.

Here is the interface:

abstract class KeyInfo {
  int getKeyInfo(int column);
}

And now let us implement the interface in our Bloc:

class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
...
  @override
  int getKeyInfo(int column) {
  }
...
}

So, given the list of columns energised, we return the rows. Now, as an exercise, lets say if we press the M key on the keyboard, which we currently check for in our KeyBoardListener, we want our C64 emulator to also show an M.

So, let us look at the keyboard matrix diagram again to see where the M key is located. The M key is located at column E and row 4. So with the bit counting starting at column A, the bit number of column E is 4. So we are interested in column bit 3 and row bit 4.

With this in mind, Let us give getKeyInfo() some meat:

  @override
  int getKeyInfo(int column ) {
    if (!keyDown) {
      return 0xff;
    }
    if ((column & 0x10) == 0) {
      return 0xef;
    } else {
      return 0xff;
    }
  }

One thing to remember here is that when working with the keyboard matrix, we don't work with the default assumption that one means active, but the other way around. So a zero means in the column byte that a certain column is energised, and a zero in the row byte means that the switch for that bit position is held down.

With all this written, let us make our Memory class make use of it:

class Memory {
...
  late final KeyInfo keyInfo;
...
  setKeyInfo(KeyInfo keyInfo) {
    this.keyInfo = keyInfo;
  }
...
}

So, we can pass our keyInfo object to our Memory class. We assign the keyInfo when our Bloc class is instantiated:

class C64Bloc extends Bloc<C64Event, C64State> implements KeyInfo {
...
  C64Bloc() : super(InitialState()) {
    memory.setKeyInfo(this);
...
  }
...
}

Finally, let us use keyInfo our Memory class:

...

  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else if (address == 0xDC01) {
      return keyInfo.getKeyInfo(_ram.getUint8(0xDC00));
    } else {
      return _ram.getUint8(address);
    }
  }
...

So, when address DC01 is read from our Memory we invoke getKeyInfo and passing it the contents of memory location DC00. At the moment we will fetch location DC00 from RAM.

Now, when we build and run, and press the M key a couple of times, the screen looks like as follows:

We managed to implement the implement a simple key press!

Implementing the full keyboard

Let us now look at implementing a full keyboard, or at least sufficient keys, like the alphabet, digits and some symbols, just to type a simple basic program within our emulator.

Up to now we kept track only of a single whether it is down via keyDown, but now we need to keep track of whether several keys are held down. So, we need like kind of a boolean matrix, or to put it more plainly, an array of eight bytes. Each column is a byte:

  final List<int> matrix = [0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff];

I mentioned earlier that in a real C64, a zero means the key is selected. So, this array filled with the value 0xff's, means no key is held down at the moment.

Previously in our Main class, we just looked for the M key being pressed and released, and then pass this event to our Bloc class. Obviously now we will need to remove this explicit check for the M key and pass all key events to our Bloc class. This necessitates us to modify our KeyC64Event class to say which key was pressed not if a key was pressed:

class KeyC64Event extends C64Event {  
  final bool keyDown;
  final LogicalKeyboardKey key;
  KeyC64Event({required this.keyDown, required this.key});
}

With this in place our Bloc class will receive indeed a key code, but what make only sense in the Flutter world. We need a kind of a lookup table or a map to convert a Flutter keyboard scan code to a C64 keyboard scan code. So for this purpose we create the following map, preferably in a separate file:

Map<LogicalKeyboardKey, int> keyMap = Map.unmodifiable({
  LogicalKeyboardKey.keyA : 0x0A,
  LogicalKeyboardKey.keyB : 0x1C,
  LogicalKeyboardKey.keyC : 0x14,
  LogicalKeyboardKey.keyD : 0x12,
...
  LogicalKeyboardKey.digit0 : 0x23,
  LogicalKeyboardKey.digit1 : 0x38,
  LogicalKeyboardKey.digit2 : 0x3B,
  LogicalKeyboardKey.digit3 : 0x08,
  LogicalKeyboardKey.digit4 : 0x0B,
  LogicalKeyboardKey.digit5 : 0x10,
  LogicalKeyboardKey.digit6 : 0x13,
  LogicalKeyboardKey.digit7 : 0x18,
  LogicalKeyboardKey.digit8 : 0x1B,
  LogicalKeyboardKey.digit9 : 0x20,
...
  LogicalKeyboardKey.space : 0x3c,
  LogicalKeyboardKey.shiftLeft : 0x0F,
  LogicalKeyboardKey.enter : 0x01,
...
});

With this map cretaed, we can now modify our listener a bit for the event KeyC64Event:

    on<KeyC64Event>((event, emit) {
      int c64KeyCode = keyMap[event.key] ?? 0;
      int col = c64KeyCode >> 3;
      int row = 1 << (c64KeyCode & 7);
      if (!event.keyDown) {
        matrix[col] |= row;
      } else {
        matrix[col] &= ~row;
      }
    });

We start off by looking up the C64 scancode, given the Flutter key code. Now bit 5-3 of the scan code is the column and bits 2-0 is the row.

In the if statement, if the key is released, we OR the bit position with. Please it is pressed, we mask off the bit position.

Now we need to modify the method getKeyInfo, which is the method our Memory class calls when reading Address DC01. When calling this method, we tell the method which columns needs to be considered. Potentially two ore more columns can be selected, in which case we need to do a kind of a OR operation, to reduce the selected columns to one.

We can express this reducing in a simple for loop:

  @override
  int getKeyInfo(int column ) {
    int result = 0xff; // Accumulator for the OR'ed numbers

    for (var row in matrix) {
      if ((column & 1) == 0) {
        result &= row; 
      }

      column = column >> 1;
    }

    return result;
  }

We are shifting the column right eveyrtime, looking everytime if the lowest bit is zero. If it is zero, we know the column is selected. We and all the selected columns together. If, for any row position in a selected column there is a zero, then the final value for that bit position would be zero. A zero means there was one or more keys selected in that bit position for the selected columns.

Now, let us see if we can write a simple program, with the keyboard input enabled:

Next, let us run the program:

We have a working program!

In Summary

In this post we implemented keyboard input and write a small test program.

In the next post we will start implementing tape loading, from a tape image.

Until next time!

Monday, 28 April 2025

A Commodore 64 Emulator in Flutter: Part 12

Foreword

In the previous post we successfully ran the Klaus Dormann Test Suite.

In this post we will be trying to boot the C64 system with its ROM's.

Enjoy!

Inserting the ROMS

Inserting the ROM's... Now that sounds like plugging and unplugging game cartridges 😂. In our case, this means loading the C64 ROM images from files into memory, and making sure our emulated CPU can access the contents.

We start by dumping the ROM images into the asset folder:

Usually for the C64 ROMS you get for download on the internet, the file names have always some version numbers in it. In my case, I just gave them simple names. Also, notice that I have removed the file program.bin we used in previous posts.

  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      memory.populateMem(basicData, characterData, kernalData);
...

So, we load the different ROM's, waiting for the loading of each file to complete, and then going to the next file for loading.

Now, you might notice that from previous posts, that we now pass more ROMS to memory.populateMem. So let us delve a bit deeper in our Memory class to see what changes are required:

...
  late type_data.ByteData _basic;
  late type_data.ByteData _character;
  late type_data.ByteData _kernal;
...
  final type_data.ByteData _ram = type_data.ByteData(64*1024);
...
  populateMem(type_data.ByteData basicData, type_data.ByteData characterData,
      type_data.ByteData kernalData) {
    _basic = basicData;
    _character = characterData;
    _kernal = kernalData;
  }
...

Fairly straightforward. Each ROM that is passed through, we store in a variable.

Something else we do, is do define a 64KB array that will act as our RAM, the significant characteristic of the C64.

So, next, let us add some address mapping:

  setMem(int value, int address ) {
    _ram.setInt8(address, value);
  }

  int getMem(int address) {
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else {
      return _ram.getUint8(address);
    }
  }

For memory writes, we write straight to the ram array. For reads, we do it the usual C64 setup:

Addresses A000-BFFF: We read from basic ROM
Addresses E000-EFFF: We read from Kernal ROM
All other addresses we read from RAM

Booting the C64 System

We are now close to booting the C64 system with all its ROM's.

First things first. Our periodic timer current runs once every second, executing 1 000 000 millions cycles worth of CPU instructions. However, we want to reduce to a 60th of a second, so that later on we can draw a frame every time our time executes, yielding 60 frames a second, which is the frame rate of a native C64:

    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
      });
    });

Every time we also execute 16666 cycles, which is the number of CPU cycles in a 1/60th of a second.

To boot the C64 ROM's, is actually fairly straightforward. You basically set the program counter to the value of the reset vector. For his we just we just create the following method:

  reset() {
    pc = memory.getMem(0xfffc) | (memory.getMem(0xfffd) <<< 8);
  }

So, here we populate the program counter with the reset vector at adress FFFC and FFFD.

We still need to call this method. We do this just after we have loaded all the ROM's:

  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final basicData = await rootBundle.load("assets/basic.bin");
      final characterData = await rootBundle.load("assets/characters.bin");
      final kernalData = await rootBundle.load("assets/kernal.bin");
      memory.populateMem(basicData, characterData, kernalData);
      _cpu.reset();
...

Now, we can finally boot the C64 System. We wait for a minute, and then hit stop to view the registers:

We see the program stabilise at address FF61. Let us have a look at the Kernal disassembly listing what is going on at this address:

As seen here, we get stuck in a loop with the memory address D012 not changing. We can expect that such a thing can happen at the moment, because with our current emulator setup that address will write and read to raw RAM, and thus nothing will happen.

In reality D012 maps to the VIC-II display registers and provide info on which rasterline on the screen we are currently at. It is quite an undertaking to implement such a raster counter, so for now, let us see if we can quickly hack together something, so that the Address D012, can just change sometimes, just to get past that loop. Here is my quick hack:

  int getMem(int address) {
    _readCount++;
    if (address >= 0xA000 && address <= 0xBFFF) {
      return _basic.getUint8(address & 0x1fff);
    } else if (address >= 0xE000 && address <= 0xFFFF) {
      return _kernal.getUint8(address & 0x1fff);
    } else if (address == 0xD012) {
      return (_readCount & 1024) == 0 ? 1 : 0;
    } else {
      return _ram.getUint8(address);
    }
  }

So, the hack is simply just a counter that keeps count of the number of reads, and we look at bit 9 of the counter. If it is set, we return a 1, otherwise a zero. In effect we will have a 1 for about a thousand counts, and then a zero for another thousand counts.

Let us now see where our program counter lands. This time it lands at E5D4. Lets look again at the disassembly listing for this address:

Here it seems we are in a waiting loop, waiting for the enter key to be pressed on the keyboard. I think this is a pretty decent place for our emulator to be and probably means that all initialisation has been completed, and we should have the welcome message in screen memory.

We want to check if the welcome message is in screen memory, but our debug dump current just show the first two pages of memory. We could, however, inspect the ram array in debug mode in Intellij.

So, startup the emulator in debug mode and press the play button to let the C64 system run at full speed. Wait for about a minute and then put a breakpoint on the first line of the getMem method in our Memory class. With the system running at full speed, that breakpoint will be hit almost instantaneously.

Open up an evaluate window and enter the following:

Here we inspect address 1024 of screen memory, which is the first byte of it. In this case the value is 32, which is a space. Inspecting addresses further in screen memory will reveal the welcome message.

In the next section we will render the contents of the screen at real time.

Rendering screen memory

We will now try and render screen memory in real time, showing a display similar to the C64 in text mode.

We ultimately need a mechanism that would allow us to work efficiently with image data on a pixel level. Flutter ultimately provide it to us via the RawImage widget, together with ui.Image with which you can work with an array of RGBA values.

Let us unpack this a bit. Let us start working with the raw array of RGBA values, where we will produce the frame for display, based on the screen memory and the character ROM.

Both the character rom and screen memory is present in our Memory class, so for now we will do the frame rendering in that class.

Firstly, let us define the byte buffer we are going to use over and over again:

class Memory {
...
    final type_data.ByteData image = type_data.ByteData(320*200*4);
...
}

So, as we can see, we have a resolution 3200x200, which is the resolution of a real C64 screen. We multiply the end result by 4, because each pixel is bytes in our buffer, one byte each for red, blue green and the alpha channel.

Next, let us write method for rendering a screen to the byte array:

  type_data.ByteData getDisplayImage() {
    const rowSpan = 320 * 4;
    for (int i = 0; i < 1000; i++ ) {
      var charCode = _ram.getUint8(i + 1024);
      var charAddress = charCode << 3;
      var charBitmapRow = (i ~/ 40) << 3;
      var charBitmapCol = (i % 40) << 3;
      int rawPixelPos = charBitmapRow * rowSpan + charBitmapCol * 4;
      for (int row = /*charAddress*/ 0 ; row < /*charAddress +*/ 8; row++ ) {
        int bitmapRow = _character.getUint8(row + charAddress);
        int currentRowAddress = rawPixelPos + row * rowSpan;
        for (int pixel = 0; pixel < 8; pixel++) {
          if ((bitmapRow & 0x80) != 0) {
              image.setUint32(currentRowAddress + (pixel << 2), 0x000000ff);
          } else {
              image.setUint32(currentRowAddress + (pixel << 2), 0xffffffff);
          }
          bitmapRow = bitmapRow << 1;
        }
      }

    }
    return image;
  }

So, here we loop through all thousand characters codes in screen memory and rendering everyone. Each character code is actually an index into character ROM, every character is its own 8x8 pixel bitmap.

Now, this method is invoke everytime when our perioc timer runs:

...
import 'dart:ui' as ui;
...
   on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(milliseconds: 17), (timer) {
          int start = DateTime.now().millisecondsSinceEpoch;
          int targetCycles = _cpu.getCycles() + 16666;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
          ui.decodeImageFromPixels(memory.getDisplayImage().buffer.asUint8List(), 
             320, 200, ui.PixelFormat.bgra8888, setImg);
      });
    });

ui.decodeImageFromPixels is a menthof within the dart:ui library of flutter. It will create an Image object from a pixel buffer, which in this case is the rendered screen buffer.

We also pass ui.PixelFormat.bgra8888 as a parameter, indicating our buffer is in the format with byte each for red, green, blue, green and alpha.

We also pass a callback method, setImg in this case, which will be called once we have the generated Image object.

So, let us implement this callback method:

    void setImg(ui.Image data) {
      emit(RunningState(image: data, frameNo: frameNo++));
    }

Here you can see we are emitting the image in a state object, so our BlocBuilder can pick up the change and render the image. You will also notice that we have a frameNo Property that we modify with each new image, so our BlockBuilder can easily pick up the change.

You will recall that from previous posts, that we did define RunningState previously, which we applied changes to now. Here is the revised version:

class RunningState extends C64State {
  RunningState({required this.image,
    required this.frameNo});

  final int frameNo;
  final ui.Image image;
  @override
  List<Object> get props => [frameNo];
}

Finally, let us modify our BlocBuilder:

...
        body: BlocBuilder<C64Bloc, C64State>(
          builder: (BuildContext context, state) {
            if (state is InitialState) {
              return const CircularProgressIndicator();
            } else if (state is DataShowState) {
              return Column(
                children: [
                  Text(getRegisterDump(state.a, state.x, state.y, state.n,
                      state.z, state.c, state.i, state.d, state.v, state.pc)),
                  Text(
                    getMemDump(state.memorySnippet),
                    style: const TextStyle(
                      fontFamily: 'RobotoMono', // Use the monospace font
                    ),
                  ),
                ],
              );
            } else if (state is RunningState) {
              return RawImage(
                  image: state.image, scale: 0.5);
            } else {
              return const CircularProgressIndicator();
            }
          },
        ),
...

So, if the state is RunningState, we return a RawImage widget, which will be displayed on the screen. We pass the image in the state to the RawImage widget. We also use a scale of 0.5, with which we basically doubles the displayed size. The native resolution of 320x200 of a C64 frame display very small on a modern display, so at least with the scale, it can appear bigger.

With everything coded we can now give it a test run. The startup sequence appear to take more or less the same time as a real C64, and eventually the welcome screen appear:

We are making progress, but still, there is no flashing cursor.

Getting the cursor to flash

Let us see if we can get the cursor to flash.

If you go down the bowls of the C64 system, you will found that the core of a standard C64 system that just started up, is that there is a timer interrupt every 60th of a second. This interrupt does a couple of things, like checking if any key was pressed or released and updating the status of the cursor.

So, let us see if we we can put a hack together, that off the bat we just force an interrupt every 60th of a second, without worrying for now to implement emulation of the full CIA chip with a timer.

The easiest way is in the step() method of our CPU class:

  step() {
    if ((_cycles > 1000000) &&((_cycles % 16666) < 30) && (_i == 0)) {
      push(pc >> 8);
      push(pc & 0xff);
      push((_n << 7) | (_v << 6) | (2 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
      _i = 1;
      pc = memory.getMem(0xfffe) | (memory.getMem(0xffff) << 8);
    }
...
  }

So, we wait for a second before triggering interrupts in 1/60 second intervals. With the change, the cursor actually flashes:

In Summary

In this post we managed to boot the C64 system with all its ROMs and managed to render screen memory in real time, showing the welcome message and the flashing cursor.

The source code for post is available in the following Github tag: https://github.com/ovalcode/c64_flutter/tree/c64_flutter_part12

In the next post we will add some keyboard interaction with our emulator.

Until next time!

Wednesday, 9 April 2025

A Commodore 64 Emulator in Flutter: Part 11

Foreword

In the previous post we ran the Klaus Dormann Test Suite on our emulator. In this process we found a couple of issues with our emulator. We fixed a couple of issues, but found a couple of more issues we still need to fixed.

In this post we will look at the remaining issues. Solving these remaining issues wasn't so much of a deal at all, so this post will be shorter normal.

The remaining fixes

One of the major issues I found while running the Klaus Dormann Test Suite on my emulator, was some incorrect values for some of the CPU data tables. This include some instructions having the incorrect address mode and incorrect instruction lengths.

The other issue I experienced, was failed test cases because decimal mode wasn't implemented. Implementing Decimal mode is fairly straightforward. We start with implementing the following methods:

  int adcDecimal(int operand) {
     int l = 0;
     int h = 0;
     int result = 0;
     l = (_a & 0x0f) + (operand & 0x0f) + _c;
     if ((l & 0xff) > 9) l += 6;
     h = (_a >> 4) + (operand >> 4) + (l > 15 ? 1 : 0);
     if ((h & 0xff) > 9) h += 6;
     result = (l & 0x0f) | (h << 4);
     result &= 0xff;
     _c = (h > 15) ? 1 : 0;
     _z = (result == 0) ? 1 : 0;
     _n = 0;
     _v = 0;
     return result;
   }
 
   int sbcDecimal(int operand) {
     int l = 0;
     int h = 0;
     int result = 0;
     l = (_a & 0x0f) - (operand & 0x0f) - (1 - _c);
     if ((l & 0x10) != 0) l -= 6;
     h = (_a >> 4) - (operand >> 4) - ((l & 0x10) != 0 ? 1 : 0);
     if ((h & 0x10) != 0) h -= 6;
     result = (l & 0x0f) | (h << 4);
     _c = ((h & 0xff) < 15) ? 1 : 0;
     _z = (result == 0) ? 1 : 0;
     _n = 0;
     _v = 0;
     return (result & 0xff);
   }

We modify the applicable instruction selectors:

       case 0x69:
         adc(arg0);
         if (_d == 1) {
           _a = adcDecimal(arg0);
         } else {
           adc(arg0);
         }
       case 0x65:
       case 0x75:
       case 0x6D:
       case 0x7D:
       case 0x79:
       case 0x61:
       case 0x71:
         adc(memory.getMem(resolvedAddress));
         if (_d == 1) {
           _a = adcDecimal(memory.getMem(resolvedAddress));
         } else {
           adc(memory.getMem(resolvedAddress));
         }
         
      case 0xE9:
         sbc(arg0);
         if (_d == 1) {
           _a = sbcDecimal(arg0);
         } else {
           sbc(arg0);
         }
       case 0xE5:
       case 0xF5:
       case 0xED:
       case 0xFD:
       case 0xF9:
       case 0xE1:
       case 0xF1:
         sbc(memory.getMem(resolvedAddress));
         if (_d == 1) {
           _a = sbcDecimal(memory.getMem(resolvedAddress));
         } else {
           sbc(memory.getMem(resolvedAddress));
         }

Test Results

With everything fixed, we can see if all the tests passed.

The Test Suite runs for about two minutes on my emulator. After the two minutes, when hitting stop, the register window will look as follows:

From this point the program counter remains at 3469. Lets have a look at the assembly listing to see what is at this address:

So, this is confirmation that our emulator passed all the tests!

In Summary

In this post we confirm that we implemented all the CPU instructions correctly in our emulator, using Klaus Dormann's Test Suite.

Here is a link to the tag of this post's source code: https://github.com/ovalcode/c64_flutter/tree/c64_flutter_part11

In the next post we will start writing some more code to boot the C64 ROM's.

Until next time!

Saturday, 1 March 2025

A Commodore 64 Emulator in Flutter: Part 10

Foreword

In the previous post we implemented the last couple of 6502 instructions in our C64 Flutter emulator.

In this post we will be running the Klaus Dormann Test Suite on our emulator to ensure we have implemented all the instructions correctly.

Starting up the Klaus Dormann Test Suite

Let us see if we can startup the Klaus Dormann Test Suite on our emulator, although only in a single stepping fashion at the moment.

To get started, we need two files from Klaus' Github repository:

The first link is the actually binary which will execute in our emulator. This is a 64KB binary which will fill the whole address space accessible by the 6502.

The second file is a listing file, containing the actual disassembled version of the binary we are running. The listing file is useful if you want to follow along to see what the program is actually doing in a certain point in time.

Firstly we dump the binary in the assets folder of our Flutter project and rename it to program.bin. This is the default binary our emulator looks for when it starts up.

Now, usually if a 6502 system starts up, it looks at the reset vector at address 0xFFFC and 0xFFFD for the starting address for which it should start executing code, something which we didn't implemented yet.

In the Klaus Test suite there is also a reset vector defined, but within the context of the Test Suite it has the function to detect if an accidental reset was triggered. So, in actual fact this Test Suite doesn't use the reset vector to everything. Rather, when using the test suite, you should just set the PC register to 0x400 and start execution. This makes our life easier, and for the moment we don't need to worry about implementing the Reset vector stuff.

So, in put cpu.dart, the following change needs to be done, change in bold:

 
...
  int _n = 0, _z = 0, _c = 0, _i = 0, _d = 0, _v = 0;
  int _sp = 0xff;
  int pc = 0x400;
...

With this we can startup our emulator and single step through the code of Test Suite.

Unattended running

To single step through the Klaus Dormann Test Suite in our emulator will be such a daunting tasks. You will probably need to click the step button thousands of times.

It would make our lives easier if we could just let the Test Suite run unattended, with us just pausing the execution once in while, to see how far we have progressed through the tests.

We do this by adding a button right next to the title. As part of the process we need to wrap both the title and the button in a row in order for everything to align properly. All this is happening in the main.dart file:

        appBar: AppBar(
          title:  Row(
              mainAxisSize: MainAxisSize.min,
              children: [
                const Text("Emulator C64"),
                BlocBuilder<C64Bloc, C64State>(
                  builder: (BuildContext context, state) {
                    return Row(
                        mainAxisAlignment: MainAxisAlignment.end,
                        children: [
                          _getRunStopButton(state, context)
                        ]);
                  })
              ]),
        ),

We want our run button to behave like a toggle switch, toggling between a play and a pause button. To do all these fancy stuff, we need to inject some state, which we achieve by wrapping everything with a BlocBuilder. We did discuss the workings of BlocBuilder in a previous post.

Now, the method _getRunStopButton() returns for us three possible buttons, depending on the state, which could be a play button, a stop button, and a disabled play button if everything hasn't initialised yet:

  Widget _getRunStopButton(C64State state, BuildContext context) {
    if(state is DataShowState) {
      return IconButton(
        icon: Icon(Icons.play_arrow),
        onPressed:  () {
          context.read<C64Bloc>().add(RunEvent());
        }
      );
    } else if (state is RunningState) {
      return IconButton(
          icon: Icon(Icons.stop_circle),
          onPressed:  () {
            context.read<C64Bloc>().add(StopEvent());
          }
      );
    } else {
      return const IconButton(
          icon: Icon(Icons.play_arrow),
          onPressed:  null
      );
    }
  }

Here we test for different states. Firstly we show an enabled play button if we are in DataShowState. As you might remember from previous posts, with DataShowState, we display a dump of memory and registers, and we can single step from that point. This is the perfect scenario to provide a play button that will run the emulator at full speed.

Pressing the play button emits a RunEvent, which we still need to implement a listener for. We will do that in a bit.

Secondly if our emulator is in the RunningState, we display the stop button. We should still implement the RunningState State, which in actual fact is a very simple implementation:

class RunningState extends C64State {}

No values or properties we need to convey to here, just conveying the mere fact when we are in the running state.

Finally, for any other state we just want to show a play button that is disabled. This will only happen when our Application is loading up and loading the memory image, which is our case is the Test Suite.

Now, we have defined a number of events that we need to listen for in c64_bloc.dart.

Firstly, let us define the listener for RunEvent. This will be the core of our unattended running. Here we want schedule a timer that runs every second and then we also execute a second worth of CPU instructions (aka 1 000 000 CPU cycles). We need to emit a RunningState state so our front end can update accordingly.

Let us start with an outline:

...
  Timer? timer;
...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
...
      });
      emit(RunningState());
    });
...

We define the timer variable as a global variable in our C64Bloc class, since we want to be able to cancel the timer in another event handler.

Now, to determine when our CPU has executed 1 million cycles worth of instructions, our CPU needs to keep record of the cycles for each of the instructions it executes. This is obviously in the step method:

...
  int _cycles = 0;
...
  int getCycles() {
    return _cycles;
  }
...
  step() {
...
    _cycles = _cycles + CpuTables.instructionCycles[opCode];
    var resolvedAddress =
        calculateEffectiveAddress(CpuTables.addressModes[opCode], arg0, arg1);
    switch (opCode) {
...
    }
  }

We have defined the instructionCycles array in a previous post, which specify the number of cycles for every opcode. So, with every step we can just add the number of cycles for the opcode being executed to a _cyles variable.

With this implemented, we can add some meat to our timer callback function:

...
    on<RunEvent>((event, emit) {
      timer = Timer.periodic(const Duration(seconds: 1), (timer) {
          int targetCycles = _cpu.getCycles() + 1000000;
          do {
            _cpu.step();
          } while (_cpu.getCycles() < targetCycles);
      });
      emit(RunningState());
    });
...

So, we just add one million to our current Cpu cycle count and that will be the target at which we will stop the loop.

Finally, we need to implement the stop event:

    on<StopEvent>((event, emit) {
      timer?.cancel();
      emit(DataShowState(
          dumpNo: dumpNo++,
          memorySnippet: ByteData.sublistView(memory.getDebugSnippet(), 0, 512),
          a: _cpu.getAcc(),
          x: _cpu.getX(),
          y: _cpu.getY(),
          n: _cpu.getN() == 1,
          z: _cpu.getZ() == 1,
          c: _cpu.getC() == 1,
          i: _cpu.getI() == 1,
          d: _cpu.getD() == 1,
          v: _cpu.getV() == 1,

          pc: _cpu.pc));
    });

Here we just cancel the timer emit a DataShowState, so after we have stopped the running, we want to display the current state of memory and the registers.

When the emulator runs unattended, we also want to hide the state display to avoid confusion and just show "running". To keep the discussion focused, I will not be going into this detail.

Running the Test Suite

Finally we are at a point where we can run Klaus Dormann's Test Suite. On startup, the screen look like this:

As dicussed, the play button to start the emulator in unattended mode is next to the title.

When clicking play, the screen changes like this:

One weird thing you might notice, is if you click run and quickly stop again, you will see the Program counter is still at 0x400, the starting address of the test suite. As if nothing executed. The reason for this is very subtle. Our timer callback will only execute if the timer lapsed. So, in our case we need to wait at least 1 second to expect some results before clicking the stop button.

So, if we let it run a bit longer, our result will look like this:

So, when stopped our Program counter was at 0x9D7. Funny thing is, you can let it run for long as you want to, but the Program counter remains stuck at 0x9D7.

What is going on here?

To find the answer we need to look at the source listing of the Test Suit and search for that address:

Here it is clear, if something went wrong with the test, it will do an endless loop at the address 09D7. So, obviously, our emulator failed test, but which one? Look back a couple of lines, we see the comment: The IRQ vector was never executed.

Aha! We never implemented IRQ's (Interrupt Requests) in our emulator. Having said that, it briefly caught me in a mystical moment, almost like as a kid and playing on a Commodore 64, I wondered for the first time what was going on underneath the hood.

In this case I wondered where the IRQ came from. This Test Suite doesn't implement any magical peripherals? After a moment I realise that this was probably caused by me not implementing the BRK instruction, and looking further back in the listing did confirm this.

This was actually a very interesting experience for me. It was the first time I encountered a problem, and my first instinct is moment of nostalgia 😂

In the following section we will implement the BRK instruction and then run the emulator again.

Implementing the BRK and RTI instructions

So, let us quickly implement the BRK and RTI instructions. There is one caveat with the BRK instruction. It is a one byte instruction, but in actual fact it behaves like a 2 byte instruction. The BRK triggers an IRQ and when it returns it doesnt return to the address directly after the BRK instruction, but one address further on.

To account for this quirk of the BRK instruction, we can adjust the instruction length in the instructionLen table for the BRK instruction to 2.

With the table adjusted, we implement the BRK and RTI instruction as follows:

      /*BRK*/
      case 0x00:
        push(pc >> 8);
        push(pc & 0xff);
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
        _i = 1;
        pc = (memory.getMem(0xffff) << 8) | memory.getMem(0xfffe);

      /*RTI*/
      case 0x40:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;
        pc = pull() | (pull() << 8);

Now, when we run the test suite again, we get passed this failed test suite. However, we end up in another endless loop at address 0xdeb, which indicates another failed test.

We will investigate this failed case, as well as other potential failed cases in the next post.

In Summary

In this post we ran the Klaus Dormann Test Suite on our Emulator in unattended mode. The first failed test case we encountered was the BRK/RTI instruction that wasn't implemented.

With the BRK/RTI instruction implemented we encountered another failed test case which we will investigate in the next post, as well as other potential failed test cases which will pop up.

You can find all the source code for this project as well as the binary image containing the Klaus Dormann test suite, here.

Until next time!

Sunday, 23 February 2025

A Commodore 64 Emulator in Flutter: Part 9

Foreword

In the previous post we implemented all stack operations for our Flutter C64 emulator. This included pushing and popping the Accumulator and the status register. Also, we implemented the JSR/RTS instructions, which also operates on the stack.

In this post we will be implementing the remaining instructions of the 6502, which includes the following:

BIT
JMP (Jump)
NOP
Register operations

With these instructions implemented, we can start in the next post to run the Klaus Dormann Test suite on our emulator to see if we have implemented all the 6502 instructions correctly.

Enjoy!

The Jump instruction

Implementing the jump instruction is just a straight forward operation of loading the program counter with a new value. Let us add the selectors for these:

        /*
JMP (JuMP)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Absolute      JMP $5597     $4C  3   3
Indirect      JMP ($5597)   $6C  3   5
         */
      case 0x4C:
      case 0x6C:
        pc = resolvedAddress;

Now, there is two address modes for this instruction: Absolute and Indirect. The absolute address mode we have already implimented in the calculateEffectiveAddress() method, but not the Indirect Address mode. So, within the calculateEffectiveAddress() method, let us add the following selector:

      case AddressMode.indirect:
        var lookupAddress = (operand2 << 8) | operand1;
        return memory.getMem(lookupAddress) | (memory.getMem(lookupAddress + 1) << 8);

The BIT instruction

Next, let us implement the BIT instruction. From the specs, the BIT instruction is defined as follows:

BIT (test BITs)
Affects Flags: N V Z

MODE           SYNTAX       HEX LEN TIM
Zero Page     BIT $44       $24  2   3
Absolute      BIT $4400     $2C  3   4

BIT sets the Z flag as though the value in the address tested were ANDed with the 
accumulator. The N and V flags are set to match bits 7 and 6 respectively in the 
value stored at the tested address.

We implement this as follows:

      case 0x24:
      case 0x2C:
        int memByte = memory.getMem(resolvedAddress);
        _z = ((memByte & _a) == 0) ? 1 : 0;
        _n = ((memByte & 0x80) != 0) ? 1 :0;
        _v = ((memByte & 0x40) != 0) ? 1 :0;

The NOP instruction

The NOP instruction is the short for No Operation. It literally does nothing except for consuming CPU cycles. One of the major uses of this instruction is to reserve some slots in memory where in future you might want to add some more instructions.

Strictly speaking you don't need to implement a case selector for this instruction in our big switch statement decoding the different op opcodes. The surrounding mechanism should just skip to the next instruction.

However, by not implementing a selector for NOP, the default selector will be invoked in the switch statement. The default selector is nice to warn us if we forgot to implement some instructions or we encountered some undocumented instructions in the code. By not giving NOP a selector we will get many false positives by hitting the default selector.

So, the selector for NOP will look as follows:

        /*
NOP (No OPeration)
Affects Flags: none

MODE           SYNTAX       HEX LEN TIM
Implied       NOP           $EA  1   2
         */
      case 0xEA:
        break;

With the Dart language we don't need the break in general. However, when you have blank case like this you will need to add it, otherwise it will fall through to the next case statement with code, which is not what we want.

Register operations

Finally, let us implement the register operations. As per the specs, these are the following Instructions:

Register Instructions
Affect Flags: N Z

These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                 HEX
TAX (Transfer A to X)    $AA
TXA (Transfer X to A)    $8A
DEX (DEcrement X)        $CA
INX (INcrement X)        $E8
TAY (Transfer A to Y)    $A8
TYA (Transfer Y to A)    $98
DEY (DEcrement Y)        $88
INY (INcrement Y)        $C8

In previous posts we did implement some of these. Doing some inventory, I found that the following still needs to be implemented:

Here is the implementation:

      case 0xAA:
        _x = _a;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0x8A:
        _a = _x;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xE8:
        _x++;
        _x = _x & 0xff;
        _n = ((_x & 0x80) != 0) ? 1 : 0;
        _z = (_x == 0) ? 1 : 0;

      case 0xA8:
        _y = _a;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

      case 0x98:
        _a = _y;
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

      case 0xC8:
        _y++;
        _y = _y & 0xff;
        _n = ((_y & 0x80) != 0) ? 1 : 0;
        _z = (_y == 0) ? 1 : 0;

This covers the instructions we wanted to implement in this post. I am not going to write a test program in this post to test the instructions we have implemented, since in the next post we will start to run the Klaus Dormann Test Suite, which will anyway surface any defects.

In Summary

In this post we implemented the remaining instructions for our emulator.

In the next post we run the Klaus Dormann Test Suite on our Emulator to see if we have some defects in our implementation. This will probably go over to multiple posts depending on how many issues we detect.

Until next time!

Thursday, 20 February 2025

A Commodore 64 Emulator in Flutter: Part 8

Foreword

In the previous we implemented the compare and branch instructions for our Flutter C64 Emulator.

In this post we will implement the 6502 stack and related operations like pushing/ popping, Jump to subroutine and Return from subroutine.

Enjoy!

The stack concept

The stack is a Last in First out (LIFO) data structure. To visualise a stack in real life, one can look at a receipt stack:

Clearly, one can see that the receipt that is most accessible is the last receipt you have placed on top of the pile.

In a CPU the stack has many uses, like if you were calling subroutines in a nested way, and you want to return to the caller of a subroutine. A stack is perfect for this, because you want access to the last return address.

On the 6502, the stack is 256 bytes in size and lives in page 1 of the memory space. That is the address range $100 - $1ff. On the 6502 the stack grows downwards starting at $1ff, growing down towards $100. Obvious as you pop stuff off the stack it goes back towards $1FF.

On the 6502 the stack has many uses, of which we already mentioned jumping and returning from sub routines. You can also push and pop registers. The 6502 also uses the stack when serving interrupts. Before an interrupt routine is called it stores the state of the CPU on the stack, so if the service routine is finished, it restores the CPU to the state before the CPU was interrupted, and the program continues as if nothing has happened.

Creating the stack mechanism

Let us start by writing some code for implementing the stack mechanism. We start by defining a stack pointer:

int _sp = 0xff;

We start with the initial value of 0x1ff, which is the starting poisition of the stack. We omit the high byte value of 1, and will just prepend it if we need to do any lookups in memory.

Now let us create some push and pop instructions.

  push(int value) {
    memory.setMem(value, _sp | 0x100);
    _sp--;
    _sp = _sp & 0xff;
  }

  int pull() {
    _sp++;
    _sp = _sp & 0xff;
    return memory.getMem(_sp | 0x100);
  }

With the understanding that the stackpointer points to the location where the push will happen, we can use the stackpointer address as is when storing the value of the push and then decrement the pointer thereafter.

However, since the pointer points to the next push location, you cannot use the location as is when doing a pull. You first need to increment the pointer and use that value for the read address.

Before ending this section, let us see if we can implement the basic stack instructions Push accumulator(PHA) and Pull Accumulator(PLA) to see if our stack implementation behaves as expected.

    /*
    PHA (PusH Accumulator)          $48  3
    */
      case 0x48:
        push(_a);
    /*
    PLA (PuLl Accumulator)          $68  4
    */
      case 0x68:
        _a = pull();
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

Implementing JSR and RTS

Let us now implement the JSR (Jump to Subroutine) and RTS (Return from Subroutine) instructions.

So, in principle when the JSR executes, it pushes the address of the next instruction on the stack as the return address before jumping to the subroutine. When the subroutine finishes executing and invoke RTS, it pulls this address again of the stack and jump to it.

However, there is a small caveat with this sequence of events. The return address pushed onto the stack is not exactly the return address of the next instruction, but the address of the next instruction -1.

This way of operation of the JSR, the designers of 6502 implemented as a kind of an optimisation. When reading instructions from memory the program counter is incremented by 1 each time, and by the time it needs to push the return address the PC is still pointing to the last byte of the JSR instruction.

Now, if were to implement the JSR/RTS in your emulator with the assumption that the value pushed on the stack is purely the address of the next instruction, without worrying about the -1 stuff, you emulator would probably work fine 99% of the time. That been said, however, I did encounter some magic 6502 code in the past that interrogate the contents of the stack for implementing stuff like copy protection or auto-starting code. In such cases, your emulator might not work correctly with such code if your emulate the JSR instruction doesn't push adresses on the stack following the -1 convention.

So, it is important to adhere to this convention when implementing the JSR/RTS instructions.

Here is the implementation of these two instructions:

/*
MODE           SYNTAX       HEX LEN TIM
Absolute      JSR $5597     $20  3   6
 */
      case 0x20:
        int temp = (pc - 1) & 0xffff;
        push(temp >> 8);
        push(temp & 0xff);
        pc = resolvedAddress;
/*
MODE           SYNTAX       HEX LEN TIM
Implied       RTS           $60  1   6
 */
      case 0x60:
        pc = pull();
        pc = pc | (pull() << 8);
        pc++;
        pc = pc & 0xffff;

Implementing the other stack operations

Let us now implement the rest of the stack operations.

The simplest of these operations are the transfer between the Stack Pointer register and the X register, which is TSX and TXS. So let us quickly implement them:

/*
        TXS (Transfer X to Stack ptr)   $9A  2
 */
      case 0x9a:
        _sp = _x;
/*
        TSX (Transfer Stack ptr to X)   $BA  2
 */
      case 0xba:
        _x = _sp;

What remains to be implemented is pushing and pulling the status register, that is the register that contains all the flags, like the Zero Flag, Negative Flag, overflow flag and do so on.

At this point the question arises in which order the flags are stored in the status byte that gets pushed onto the stack. One possibility is deciding on the order of the flags yourself and emulation will probbaly work correctly 99% of the time.

However, as I mentioned in the previous section where we implemented the JSR/RTS instructions, you often 6502 machine language programs that inspect the contents of the stack, so if you decide the order of the flags in the status byte yourself, this code might not work correctly.

The question is: How do we find the correct order of the flags in the status register? In the general the web sites that gives you info on the 6502 instructions, don't provide you with this info on the status register.

After digging a bit on the internet, I found the information via the following link:

https://www.princeton.edu/~mae412/HANDOUTS/Datasheets/6502.pdf

They provide a nice diagram for the status register:

Some extra information about the status register is that bit 4 and 5 should be one when pushed on the stack. Similarly, when popping this value back to the status register, we ignore bits 4 and 5. With all this said, let us implement the PHP and PLP instructions:

/*
        PHP (PusH Processor status)     $08  3
 */
      case 0x08:
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
/*
        PLP (PuLl Processor status)     $28  4
 */
      case 0x28:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;

We have implemented all instructions for this post. In the next section we will write a test program for all the instructions we have added.

The Test Program

We will use the following for our test program:

0000 A9 0A LDA #$0a
0002 48    PHA
0003 48    PHA
0004 48    PHA
0005 48    PHA
0006 a2 50 LDX #$50
0008 9a    TXS
0009 48    PHA
000a 48    PHA
000b 48    PHA
000c 48    PHA
000d A9 7F LDA #$7f
000f 69 01 ADC #$01
0011 20 19 00 JSR TEST
0014 68    PLA
0015 68    PLA
0016 68    PLA
0017 68    PLA
0018 68    PLA
0019 08 TEST PHP
001a B8    CLV
001b A9 00  LDA #$00
001d 28     PLP
001e 60     RTS

Here we test a couple of operations of the stack. Pushing and pulling elements from the stack, changing the stack pointer, doing a JSR/RTS and pushing and pulling the Status register.

Currently within our emulator, we only have a view of the first page of memory (e.g. bytes 0 to 255). However, when executing the above program it would be nice to extend the view so we can see what is happening on the stack as well. I have made the change and it look like this:

I am not going to cover the changes required to adjust the view like this, but it is available in a git tag I have created here. This tag also contains the test program for this post as binary which will execute as you click the step button.

Lets see how the stack changes as we execute the program. We start by pushing the Accumulator a number of times to the stack. We can see our values towards the end of page 1:

We then change the stack pointer to 0x50 and do a couple of pushes again of the Accumulator. We can now see the contents pushed is now in a different aread in memory:

Next, we force the Overflag flag to be set by doing an addition that causes an overflow after which we push the status register. With the overflow operation we just mange to set as much flags as possible. We then jump to a sub routine which pushes some stuff on the stack.

At this point, our memory dump will look like this:

The return address pushed is 0013. As mentioned in a previous section the return address pushed is always one less than the actual address, because of the design of the 6502.

The value pushed for the Status Register is F0 (e.g. the upper 4 bits set). As mentioned previously, bits 4 and 5 are always set, and because of the operations we did, the overflow flag is set as well as the negative flag.

We then clear the negative and overflow flag on purpose to see if the PLP instruction at the end of the subroutine restore them for us.

We then correctly return from the subroutine continuing execution at address 0014. We then do a number of pulls to our accumulator to see if we get back the same values that originally pushed. By purpose I have added an extra PLA afterwards to see what it does. And as expected, we get a 00 because that it after the last value.

This concludes what we want o achieve in this post

In Summary

In this post we implemented all stack operations, including push and pull the accumulator and the Status register. We also implemented the JSR/RTS instructions, which also relies on the stack.

We are just about finished with implementing all instructions for the 6502. What remains are the following:

BIT
JMP (Jump)
NOP
Implied register operations

So, in the next post I will be implementing these.

With the above implemented, we can move onto more interesting things, like running the Klaus Dormann Test Suite on our emulator to see if it behaves like a real 6502. This is very important, because it will help us to emulate a game as accurately as possible.

Until next time!

Monday, 10 February 2025

A Commodore 64 Emulator in Flutter: Part 7

Foreword

In the previous post we implemented the Logic operator and bit shifting instructions for our Flutter C64 emulator.

In this post we will be implementing the Compare and branching instructions.

The Compare instructions

The comparison instructions remind us of the if statement you get in almost every programming language where you test two numbers to see which one is the biggest or if they are equal.

In machine language, like on the 6502, we mimic the if statement via a compare instruction which subtract the two numbers effecting either the Carry, negative or zero flag. The then part of the if statement we mimic with a branch instruction, where you can say branch to a particular address depending on a certain state of one these flags. More on the branch instructions in another section.

One concern when mentioning the fact that the compare instruction does a subtract, is whether an overflow is a possibility as we experience with an SBC (subtract with carry). Checking the documentation and seeing that no Overflow flag is set by the compare instruction, is indeed confusing.

There is, however, two facts that make the overflow flag not relevant with a compare instruction. One fact is that a compare does an unsigned comparison. The other reason is we only consider the Carry or Zero flag when doing a comparison, and don't really look at the Negative flag.

Let us now see how we can implement the compare instructions in our flutter emulator.

One can get into temptation just to in Flutter to just do a physical subtraction when implementing a compare instruction. However, one would not be able to accurate emulate a 6502 compare instruction when you do this. Let us go into a bit more detail into why this is.

Lets take an example. If a number in the accumulator is bigger than the other number compare. The carry flag needs to set. The carry flag corresponds to bit 8 of the result, or have a weight of 256. So, suppose you compare 2 with 1, you should get the value 257, which is this value in binary:

(1) 0000 0001

If you just do a subtraction in flutter for this, you will just get 1. Strictly speaking, there will still be a carry in the background of Flutter and your CPU the operation is performed, but because the numbers have much more bits in modern day CPU's than 8 bits, like 64 bits, the carry bit will probably be at bit 65 or so.

So, let us see if we can emulate the 6502 compare instructions more accurately. To do this, we meed to understand how the 6502 does subtraction. All this boils down to Two's complement for representing negative numbers. Two's complement basically saus in order to make a number negative, you first need to negate the number (that is making every 1 a zero and every zero a one) and add one to the result.

Say for instance you want to represent -1. First, in binary you have:

0000 0001

Now do the negation:

1111 1110

And then add 1:

1111 1111

At first glance, this number doesn't look meaningful, but lets take an analogy that will shed new light on the meaning of such a number. Everyone knows about an odometer of a car. It starts at 000000 and counts till 999999. When it goes past 999999, it goes back to 000000.

Suppose we could do something interesting. With the odometer at 000000, we go back 3 units, then you are at 999997. Now, if you add 5, you get to (1)000002. The one is in brackets because the digit doesn't really exists on the odometer. But, what we have actually done here was subtracting 3 from 5 using addition! The 999997 is the ten's complement representation of -3.

We can use the same analogy in binary. Lets say you have the 8 bit binary number 0000 0000. If you move back one unit, you get 1111 1111, where everything is zero. This actually corresponds to -1, which we determined earlier.

If you move back one further unit you get 1111 1110 for -2 and 1111 1101 for -3. All this you can verify using 2's complement.

Let us now use this knowledge with a compare. Suppose you want to compare 2 and 1. So, we do 2-1 in binary, with the -1 converted to two's complement:

0000 0010

+ 1111 1111

(1)0000 0001

We can see we have a carry indicating the first number is bigger than the second. If we swop the number around, i.e. 1-2, we get this:

0000 0001

+1111 1110

1111 1111

In this case 1111 1110 is the two's complement of -2. In this case we dont get a carry with the addition, meaning the first number is smaller.

Let us now do some coding to implement these instructions in our flutter emulator. We create the following compare method which we can use among the different flavours of compare instructions:

  void compare(int operand1, int operand2) {
    operand2 = ~operand2 & 0xff;
    operand1 = operand1 + operand2 + 1;
    _n = ((operand1 & 0x80) == 0x80) ? 1 : 0;
    _c = (operand1 & 0x100) != 0 ? 1 : 0;
    _z = ((operand1 & 0xff) == 0) ? 1 : 0;
  }

This method starts off with doing twos complement, but we and the result of the negate with 0xff, so we just sit with the lower 8 bits. Our flutter emulator will probably run on 64 bit machines, which meand if we do a negate, we will probably sit with a 64-bit number where bits 8 to 63 are ones, which will probably give us a result which we don't want.

The rest of this method is pretty straight forward. Bit 7 of the result is the negative flag, Bit 8 is the carry flag. Also we use the lower 8 bits to check if the result is zero.

With this method implemented, we can now implement the individual compare opcodes. Lets start with the CMP instructions:

/*
CMP (CoMPare accumulator)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CMP #$44      $C9  2   2
Zero Page     CMP $44       $C5  2   3
Zero Page,X   CMP $44,X     $D5  2   4
Absolute      CMP $4400     $CD  3   4
Absolute,X    CMP $4400,X   $DD  3   4+
Absolute,Y    CMP $4400,Y   $D9  3   4+
Indirect,X    CMP ($44,X)   $C1  2   6
Indirect,Y    CMP ($44),Y   $D1  2   5+

+ add 1 cycle if page boundary crossed
 */
      case 0xC9:
        compare(_a, arg0);
      case 0xC5:
      case 0xD5:
      case 0xCD:
      case 0xDD:
      case 0xD9:
      case 0xC1:
      case 0xD1:
        compare(_a, memory.getMem(resolvedAddress));

Pretty straightforward, and we pass the value of the accumulator in, in both cases.

Lets do the same with CPX and CPY:

/*
CPX (ComPare X register)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CPX #$44      $E0  2   2
Zero Page     CPX $44       $E4  2   3
Absolute      CPX $4400     $EC  3   4
 */
      case 0xE0:
        compare(_x, arg0);
      case 0xE4:
      case 0xEC:
        compare(_x, memory.getMem(resolvedAddress));

/*
CPY (ComPare Y register)
Affects Flags: N Z C

MODE           SYNTAX       HEX LEN TIM
Immediate     CPY #$44      $C0  2   2
Zero Page     CPY $44       $C4  2   3
Absolute      CPY $4400     $CC  3   4
 */
      case 0xC0:
        compare(_y, arg0);
      case 0xC4:
      case 0xCC:
        compare(_y, memory.getMem(resolvedAddress));

So, we pass in the value of register x for CPX instructions and the value of register y for CPY instructions.

This conclude all the compare instructions.

Branching Instructions

Let us now look at the branching instructions. Every branching instruction branch depending on the state of a certain flag, whether it is the Carry flag, Zero Flag, Negative flag and so on.

With a branch instruction we don't supply an absolute address to jump to if the branch condition is true, but a relative address that you need to add to the program register to find the destination address.

Let us write a quick 6502 machine code program to understand the branch instructions better:

4000 A9 05 LDA #$5
4002 38    SEC
4003 E9 01 SBC #$1
4005 D0 FC BNE $FC
4007 A9 22 LDA #$22

Here we have a program where we basically have a loop where the Accumulator starts with value 5, and gets decremented till it reaches zero.

The controller of the loop is at address 4005, the BNE (Branch if not equal) instruction, that will keep branching back to the SBC instruction at address 4003 until the zero flag is set.

Now the paramter of the BNE instruction might look confusing, but in actual fact, it is a 8 bit two's complement number you need to add to the program counter if the branch is to be taken. This means you can jump in the range -128...127.

In our example, the parameter $FC is the two's complement for -4. Now when we want to execute the BNE, program counter is just after this instruction, which in this case is 4007. Subtract 4 from this, and you are at address 4003, which is where we want to be.

With all this said, let us see if we can emulate the branch instruction in Flutter. All in all this boils down to adding a byte value to a 16-bit value, with a twist: The byte value is signed! This is a bit tricky to emulate on most platforms, because if you add a byte value to a 16-bit value, the value will always go up, and not down if it is negative.

Lets look at a couple of ways to solve this. Off the bat, one would probably do something like this in Flutter:

    if (operand1 > 127) {
      operand1 = operand1 | 0xff00;
    }
    return (pc + operand1) & 0xffff;

So, if we see our offset is negative, e.g. our byte value is bigger than 127, we pad bits 8-15 with ones. If we then add this to the program counter, the lower 16 bits of the result would indeed be the result of a subtraction.

This is indeed a solution, but can't we make it more elegant? Lets look a bit what the famous C64 emulator, VICE do with this:

So, in VICE, if the branch is to be taken it does something very fancy when calculating the destination address. It casts the byte to a signed char! This is a very nifty trick which the C language provides. By casting a byte as a signed value, the C compiler honours the fact that this byte is a twos complement value, and thus if the value of the byte is negative, it will do the subtraction for you.

However, that nifty trick is in C and not in Flutter in which we develop this emulator. So, the question is: Is there a similar nifty trick we can use in Flutter for this? Indeed there is. In Flutter for every int, there is a toSigned() method you can call. As parameter, you pass it the number of bits your number is wide, assuming the last most significant bit is the sign bit. So, if you do something like the following:

0xfe.toSigned(8)

You will get back -2.

We now have enough info to calculate an address for the relative address mode in the method calculateEffectiveAddress:

 int calculateEffectiveAddress(int mode, int operand1, int operand2) {
    var modeAsEnum = AddressMode.values[mode];
    switch (modeAsEnum) {
...
    case AddressMode.relative:
        return (pc + operand1.toSigned(8)) & 0xffff;
    }
...
    return 0;
  }

Next, we implement the following method:

 branchConditional(bool doBranch, branchAddress) {
    if (doBranch) {
      pc = branchAddress;
    }
  }

And finally, we can implement all the branch instructions:

    /*
    BPL (Branch on PLus)           $10
     */
      case 0x10:
        branchConditional(_n == 0, resolvedAddress);
    /*
    BMI (Branch on MInus)          $30
     */
      case 0x30:
        branchConditional(_n == 1, resolvedAddress);
    /*
    BVC (Branch on oVerflow Clear) $50
     */
      case 0x50:
        branchConditional(_v == 0, resolvedAddress);
    /*
    BVS (Branch on oVerflow Set)   $70
     */
      case 0x70:
        branchConditional(_v == 1, resolvedAddress);

    /*
    BCC (Branch on Carry Clear)    $90
     */
      case 0x90:
        branchConditional(_c == 0, resolvedAddress);
    /*
    BCS (Branch on Carry Set)      $B0
     */
      case 0xB0:
        branchConditional(_c == 1, resolvedAddress);
    /*
    BNE (Branch on Not Equal)      $D0
     */
      case 0xD0:
        branchConditional(_z == 0, resolvedAddress);
    /*
    BEQ (Branch on EQual)          $F0
     */
      case 0xF0:
        branchConditional(_z == 1, resolvedAddress);

A Test program

Let us end this post where we a write a quick test program doing a compare and branch.

At this point of writing the test program, it really start to be become convenient to have the DEX and DEY commands, which we don't have implemented at the moment. So, I took the liberty to implement them in the emulator. I will not show the implementation here, but you are welcome to look at that on my github page.

So, here is the test program:

4000 A2 0A LDX #$0A
4002 CA    DEX
4003 E0 04 CPX #$04
4005 D0 FB BNE LOOP
4007 A9 15 LDA #$15

This program will loop with values from $a to $4 in register X.

You can find this program as binary as well as the state of our emulator as per this post, via this tag:

https://github.com/ovalcode/c64_flutter/releases/tag/c64_flutter_part_7

In summary

In this post we implemented the brnach and compare instructions.

In the next post we will be implementing stack operations in our emulator.

Until next time!