Wednesday, 18 December 2024

A Commodore 64 Emulator in Flutter: Part 3

Foreword

In the previous post we explored some further basics about Flutter and we looked at BloCs, state and events. In the end we wrote a very simple application for illustrating these concepts, which displayed the current epoch each time you press a button.

In this post we will work more towards our emulator. We will start to write a very simple emulator with just a few instructions implemented. With the app we will step through a machine language program and see the registers and memory dump after each instruction.

Using files in flutter

When writing an emulator on any platform, one of the first questions that comes to mind, is how do you store the binary data of all the ROM's in memory?

The quick answer is just to use a byte array, but the next question to that is how do one gets the data to populate the array. When I was creating my JavaScript emulator many years ago, in the early phases, I would just hardcore array data of 6502 machine code into the array definition.

This worked perfectly for small test programs, but as data grew to over 64 Kilobytes, suddenly this became a very messy solution, with JavaScript containing many lines of code for an array definition representing this data.

This scenario gets further dire if you want people to use your emulator where they can just slot in tape image files of their favourite games. So, it is clear that for any emulator, you need the ability to read binary files.

In the beginning of JavaScript emulator, working with binary files felt like quite a night mare for me. When you open up your index.html file for your emulator directly from your local file system, there was no way you could automatically load your ROM's from your local file system too, when your emulator starts up. The only way one could do it was with user intervention by clicking on a file button and letting him choose the file you want.

Not very intuitive. I cam to the conclusion that it is impossible to run an emulator via the local file system. The only way to automate the process of loading ROM's, was the serve the pages via a web server. You could then also make XMLHttpRequests to the server for returning the ROM's.

Using XMLHttpRequests added some further complexities in that it is asynchronous. Your JavaScript would send off a request for the data and then would continue with the rest of your code. Not so desirable if you want to wait for the data of the ROM's to load before starting your emulator.

With this past experience with JavaScript I wonder how Flutter would handle this asynchronous nature of files and decided there and then that this is the first thing I should check out before starting with serious Emulator stuff.

Looking around, I found that one can indeed add binary files to your flutter project by means of assets. So start off by creating an assets folder and dumping a binary file inside it:


Next, you need to add an entry to our pubspec.yaml file to make our project aware of our assets:


I want to stress again that this yaml file is sensitive to indentation, so make sure assets: aligns with the preceding uses-material-design attribute, and its bullet item on the following line is indented more.

Let us now add a small code snippet to our C64Bloc class for reading the first three bytes of our binary file and print it to the console:

class C64Bloc extends Bloc<C64Event, C64State< {
  C64Bloc(): super(C64State()) {
    rootBundle.load("assets/program.bin").then((value)  {
      print(value.getUint8(0));
      print(value.getUint8(1));
      print(value.getUint8(2));
    });
    on<C64Event>((event, emit) {
      emit(C64State());
    });
  }
}
The IDE will complain about rootBundle and suggest an import. Just accept the import and the error will go away.

Let us go into the bolded code into a bit more detail. As the name suggest, the load method will load our binary file. It should be noted that the load method is also an asynchronous method that will return immediately. You will also see that the method signature of the method is Future<ByteData>. ByteData is the type we want, but it is wrap in a future. You get the actual value by the then method which will only be invoked once the data is available.

You then call getUint8 to get a byte from from the data specifying the position you want as parameter. The first three bytes of the data will be written to the console. In my case it is the following:

169
21
141
This is a part of a very simple 6502 Machine language program. 169 which is Load Accumulator immediate and 141 which is Store Accumulator absolute.

Outputting data to the screen

Let us now see if we can output the data to the screen instead of the browser. This exercise will give us more inside into events and state.

First, let us see if we can structure our rootBundle.load in a more elegent way. We can indeed do this with an on<> event selector with an async selector.

So, let us implement this in our C64Bloc class. At this point our C64Block class has become a kind of a dumping ground for testing ideas. Let us cleanup this class a bit, so our whole C64Bloc class will look like this:

class C64Bloc extends Bloc<C64Event, C64State> {
  C64Bloc(): super(C64State()) {
    on<InitEmulatorEvent>((event, emit) async {
      final byteArray = await rootBundle.load("assets/program.bin");
    });
  }
}
You will see that our on selector have an async in the method signature. With this in the method signature, one can use the await when you you call another asynchronous method, which in this case is the load method.

When you use the await keyword, that line of code will wait until the asynchronous method is complete, returning the actual data, and then only continue with the next line of code.

You will also note that I have introduced another event type InitEmulatorEvent. As we go along and we need to define other different types of events, we will also define additional classes for them. However, with our Bloc class, we can only specify one type of Event Classes that we can accept, which in is case is C64Event.

The only way to get around this "limitation" would be to make all our event classes subclass C64Event and make the C64Event class itself abstract so that we don't create events by accident of C64Event type itself.

So, let us make some adjustment to the class C64Event and declare  the new InitEmulatorEvent class:

abstract class C64Event extends Equatable {
  @override

  List<Object?> get props => [];

}

class InitEmulatorEvent extends C64Event {}
Now that we have defined an on event for loading a binary file into a byteArray, the question is: How do we trigger this on selector?

This can be done by doing an add below our selector:

class C64Bloc extends Bloc<C64Event, C64State> {
  C64Bloc(): super(C64State()) {
    on<InitEmulatorEvent>((event, emit) async {
      final byteArray = await rootBundle.load("assets/program.bin");
    });
    
    add(InitEmulatorEvent());
  }
}

At this point it becomes necessary for us to have multiple states. When our emulator starts up, it should be in an initial state. When we have loaded a byteArray with our binary file, we want another state containing a dump of the data which our widget will use to display it.

So, lets rework our states a bit:

abstract class C64State extends Equatable {
  final int time = DateTime.now().millisecondsSinceEpoch;

  @override
  List<Object?> get props => [];
}

class InitialState extends C64State {}

class DataShowState extends C64State {
  DataShowState({required this.mem0, required this.mem1, required this.mem2});

  final int mem0;
  final int mem1;
  final int mem2;

  @override
  List<Object> get props => [mem0, mem1, mem2];
}
Again, we have made C64State abstract so that we don't by accident create an instance of this class. This implies of course that in our Bloc we should say C64Bloc: super(InitialState()).

In DataShowState there is quite a bit going on, so lets break it down a bit. The constructor have a number of required parameters. This is basically the parameters we should pass when we create an instance of this state.

Also, get props, have mem0, mem1 and mem2 as props. This just makes it easy to decide for flutter when we submit a new state do decide if it is different and therefore force a redraw of the widget.

We are now ready to emit some data for display to the front end. First, we need to emit the state in our selector:

    on<InitEmulatorEvent>((event, emit) async {
      final byteArray = await rootBundle.load("assets/program.bin");
      emit(DataShowState(mem0: byteArray.getUint8(0),
          mem1: byteArray.getUint8(1),
          mem2: byteArray.getUint8(2)));
    });
Here it just shows us again how nice await works and we can just use the data loaded from the previous line without worrying about jumping through other asynchronous hoops.

Now, we can actually use the data in this state in our BlocBuilder block of our widget:

...
        body: BlocBuilder<C64Bloc, C64State>(
          builder: (BuildContext context, state) {
            if (state is InitialState) {
              return const CircularProgressIndicator();
            } else if (state is DataShowState) {
              return Text(
                  '${state.mem0.toString()} ${state.mem1.toString()} ${state.mem2.toString()} ');
            } else {
              return const CircularProgressIndicator();
            }
          },
        ),
...
Within our widget, we are only interested when our BloC is in state DataShowState. For all other states we will just show a circling Progress Indicator.

The screen will render as follows:


So, we managed to render the data to the screen instead of the console!

First steps towards the emulator

Now that we have discovered how to read a binary file in flutter and displaying some of the contents on the screen, let us now start to implement the first things in our emulator.

First let us create a class for our memory in a file called memory.dart:

import 'dart:typed_data' as type_data;

class Memory {
  late type_data.ByteData _data;

  populateMem(type_data.ByteData block) {
    _data = block;
  }

  setMem(int value, int address ) {
    _data.setInt8(address, value);
  }

  int getMem(int address) {
    return _data.getUint8(address);
  }

}

Here I have a method called populateMem that we will call from the outside to set memory with the binary file we loaded.

Also, we have the usual methods we expect from any memory class, to get the data from a location and setting data to a location.

Next, let us create an outline for our Cpu in a file called cpu.dart:

import 'dart:typed_data';

import 'memory.dart';

class Cpu {
  final Memory memory;
  int _a = 0, _x = 0, _y = 0;
  int pc = 0;
  Cpu({required this.memory});

  int getAcc() {
    return _a;
  }

  int getX() {
    return _x;
  }

  int getY() {
    return _y;
  }

  step() {
  ...
  }
}
Here we add the usual registers you find in a 6502, which is a, x, y and the program counter. We also pass it an instance from the Memory class we have created previously, so that the cpu can load its instructions from memory.

We also have a step method which will evolve as we go along.

I have also implemented getters for the registers, so we can display them on the screen while stepping.

With our memory and Cpu created, lets instantiate them in our BloC:

...
class C64Bloc extends Bloc<C64Event, C64State> {
  final Memory memory = Memory();
  late final Cpu _cpu = Cpu(memory: memory);

  C64Bloc() : super(InitialState()) {
    on<InitEmulatorEvent>((event, emit) async {
      final byteArray = await rootBundle.load("assets/program.bin");
      memory.populateMem(byteArray);
...
      });
...
  }
}
...
With everything initialised, it would be nice to show a dump of the registers on screen, as well as the first 256 bytes of memory as a dump. For this we need to modify our DataShowState a bit:

class DataShowState extends C64State {
  DataShowState(
      {required this.memorySnippet,
      required this.a,
      required this.x,
      required this.y,
      required this.pc});

  final ByteData memorySnippet;
  final int a;
  final int x;
  final int y;
  final int pc;

  @override
  List<Object> get props => [...];
}

So, this state now stores the value of the a, x, y and pc register, as well as a small snippet of memory. We still need to think what we are going to use for our get props, which Flutter uses to determine if one state object is different than another. I tried using object reference (e.g. this), but I got a stack overflow when everything runs. It turns out that you should never use this that extends Equatable. Equtable itself overrides the == operator. So, if you use this for get props, it will eventually use the == operator, which will cause it to call itself again, becuase of the overide, which will lead to recursion.

In the end it would be easier to use an increasing number in the state for checking if a state is different from another one. So, we add the following code:
class DataShowState extends C64State {
  DataShowState(
      {...
      required this.dumpNo});

...
      final int dumpNo;

  @override
  List<Object> get props => [dumpNo];
}


With our DataShowState been retrofitted a bit, we can push a state update when memory has been populated:

    on<InitEmulatorEvent>((event, emit) async {
      final byteArray = await rootBundle.load("assets/program.bin");
      memory.populateMem(byteArray);
      emit(DataShowState(          
          dumpNo: dumpNo++
          memorySnippet: type_data.ByteData.sublistView(byteArray, 0, 256),
          a: _cpu.getAcc(),
          x: _cpu.getX(),
          y: _cpu.getY(),
          pc: _cpu.pc));
    });

For memory snippet we just take the first 256 bytes of what we loaded as a snippet.

You will also see we do a dumpNo++. Every DataShowState we emit, we need to increase this variable, which we also defined within our BloC 

Moving towards our widget, we add two methods to MyHomePage, which will give some meaningful strings to display from the state:

  String getRegisterDump(int a, int x, int y, int pc) {
    return 'A: ${a.toRadixString(16)
        .padLeft(2, '0')
        .toUpperCase()} X: ${x.toRadixString(16)
        .padLeft(2, '0')
        .toUpperCase()} Y: ${y.toRadixString(16)
        .padLeft(2, '0')
        .toUpperCase()} PC: ${pc.toRadixString(16)
        .padLeft(4, '0')
        .toUpperCase()}';
  }

  String getMemDump(type_data.ByteData memDump) {
    String result = '';
    for (int i = 0; i < memDump.lengthInBytes; i++) {
      if ((i % 16) == 0) {
        String addressLabel = i.toRadixString(16).padLeft(4, '0').toUpperCase();
        result = '$result\n$addressLabel';
      }
      result =
      '$result ${memDump.getUint8(i).toRadixString(16)
          .padLeft(2, '0')
          .toUpperCase()}';
    }
    return result;
  }

The first function will give us all the content of the registers in a row in hexadecimal.

The second function will give a dump of our memory in a typical hexdump, 16 bytes in a row, and the address on the left side.

With our dump functions defined, let us change our widget to make use of them:

...
        body: BlocBuilder<C64Bloc, C64State>(
          builder: (BuildContext context, state) {
            if (state is InitialState) {
              return const CircularProgressIndicator();
            } else if (state is DataShowState) {
              return Column(
                children: [
                  Text(
                    getRegisterDump(state.a, state.x, state.y, state.pc)
                  ),
                  Text(
                    getMemDump(state.memorySnippet),
                  ),
                ],
              );
            } else {
              return const CircularProgressIndicator();
            }
          },
        ),
...
As you can see, we are wrapping our two Text components for the register and memory dump into a column, with the one component underneath the other. This will render as follows:


The bytes in our memory dump long line up so great, but we will fix it a bit later.

Adding the first instructions

We are now ready to add our first instructions to our Cpu. These instructions will be Load Accumulator immediate and Store accumulator Absolute.

Add the following method to the cpu.dart:

    step() {
      var opCode = memory.getMem(pc);
      pc++;
      switch (opCode) {
        case 0xa9:
          _a = memory.getMem(pc);
          pc++;
        case 0x8d:
          var arg1 = memory.getMem(pc);
          pc++;
          var arg2 = memory.getMem(pc);
          pc++;
          memory.setMem(_a, (arg2 << 8) | arg1);
      }
    }

So, here we have two selectors for our instructions LDA# (i.e. opcode A9) and STA (i.e. opcode 8D).

Now within our BloC, we also need to define an event which will be triggered when we press a button for performing a step:

  C64Bloc() : super(C64Initial()) {
...
    on<StepEvent>((event, emit) {
      _cpu.step();
      emit(C64DebugState(
          dumpNo: dumpNo++,
          memorySnippet:
              ByteData.sublistView(memory.getDebugSnippet(), 0, 256),
      a: _cpu.getAcc(), x: _cpu.getX(), y: _cpu.getY(), pc: _cpu.pc));
    });
...
  }

Finally, we need to modify our floating action button, which we defined in the previous post, to trigger a step event on each click:

        floatingActionButton: FloatingActionButton(
          tooltip: 'Step',
          onPressed: () {
            context.read<C64Bloc>().add(StepEvent());
          },
          child: const Icon(Icons.arrow_forward),
        ));

The Results

So with all code writen, let us give our Flutter program another test run.

Our app starts up with the following screen:


So, with the initial state, we have the program loaded in memory and the registers are zero. People that are familiar with the 6502 CPU will know that the CPU doesn't start executing at address 0 out of reset, but rather look for an initial location from the reset vector at locations 0xfffc and 0xfffd. We will eventually get there in later posts, but for now we are just keeping things simple, and just start executing at location 0.

Next, let us click the step button, lower right and see what happens.

As highlighted, we see two things happened to our registers. Accumulator has been loaded by value 0x15, and our Program counter has updated to value 2, which points to a store accumulator instruction.

Lets click the step button again:

This time we see that the value of the accumulator, 0x15, have been stored at memory location 0x20.

So, we have successfully implemented two instructions in our accumulator!

Viewing the dump in Mono Space

As seen, with our memory dump, the byte values doesn't line up so nicely underneath each other.

To fix this, we need to download a mono space font and add it to our project. So head to the following URL:

https://fonts.google.com/specimen/Roboto+Mono

Then click "Get Font" and then "Download All".

You then need to open the file the zip and extract the file RobotoMono-VariableFont_wght.ttf to the folder fonts folder of your project. The fonts folder should thus be on the same level as your assets folder.

Next, we should edit our pubspec.yaml file again. Open it, and below the assets section add the following section:

  fonts:
    - family: RobotoMono
      fonts:
        - asset: fonts/RobotoMono-VariableFont_wght.ttf
          weight: 700
Now, we need to add a style element to the Text which displays the memory dump:

                  Text(
                    getMemDump(state.memorySnippet),
                    style: const TextStyle(
                      fontFamily: 'RobotoMono', // Use the monospace font
                    ),
                  ),

With this the dump looks better:


In Summary

In this post we created a very simple emulator in Flutter that executed two types instructions, LDA and STA.

In the next post we will continue to evolve our emulator by adding the different address modes.

Till next time!

Friday, 22 November 2024

A Commodore 64 Emulator in Flutter: Part 2

Foreword

In the previous post I gave introduction to my idea of creating a C64 Emulator in Flutter.

I explained briefly how to install Flutter and configuring an IDE for using it and looked at a Hello World project created by IntelliJ.

In this post we will continue our journey on creating a C64 emulator in Flutter.

Introduction to using BloCs

Let us look a bit more into using BloCs. From the previous post we know that BloC stands for Business Logic component.

The term describes it all, which is splitting your business logic into a component. This imply having a stateless widget. The state will be kept in the BloC component, and when state changes, it will send the changed state to the widget, so that the widget will be re-rendered with the new state.

When using BloCs you basically need to remember two classes: BlocProvider and BlocBuilder. With BlocProvider, you create an instance of your BloC. BlocProvider then allows you to inject this instance further down in widget your tree by means of a BlocBuilder.

To use these classes in your widget tree, you should first create a proper BloC class, which I will discuss in the next section

BloC, Event and State

Let us create BloC class. Two things a Bloc class needs is an Event class and a State class.

You might remember from the previous post, I was illustrating by means of a diagram explaining what the purpose of an Event and a State were.

An Event you would use, for instance, when you click a button in your flutter app and you want your BloC to react in a certain way. You would then trigger an Event when you click a button, and your BloC will listen for that event.

In response to the event, your BloC will do some processing and afterwards it might have some data you want to display in your widget. In this case, your Widget will emit State, which your BlocBuilder will use to construct a widget for display.

Let us start by creating an Event class. Our Event class will be extending Equatable, for which we need to add the following dependency in pubspec.yaml:

equatable: ^2.0.5
pubspec.yaml is like a project file in Flutter hosting different settings and dependencies. I will not be going into details of the pubspec.yaml here. There is quite a few websites going into depth about this file. However, I will mention if there is things that needs to be added to this file for our project.

With this dependency added, and Pub get being run, we can now create our State class:

import 'package:equatable/equatable.dart';

class C64State extends Equatable {
  @override
  // TODO: implement props
  List<Object?> get props => throw UnimplementedError();
}
Similarly we create an event class:

import 'package:equatable/equatable.dart';

class C64Event extends Equatable {
  @override
  // TODO: implement props
  List<Object?> get props => throw UnimplementedError();
  
}
We are now ready to create our BloC, but first we need add the following dependency:

flutter_bloc: ^8.1.3
Now we can create our BloC class:

import 'package:flutter_bloc/flutter_bloc.dart';

import 'c64_event.dart';
import 'c64_state.dart';

class C64Bloc extends Bloc<C64Event, C64State> {
  C64Bloc(): super(C64State());

}
Our created classes looks fairly empty, but it will get more body as go along. One thing you will notice with our C64Bloc class is that we pass an instance of C64State() to our super. This is our initial state and will force a render of our Stateless widget. There will be subsequent state change for which we will use emit() to update our UI. More on this later.

Let us now move onto using the BlocProvider class. Let us refresh ourselves on how our app class looks again:

class MyApp extends StatelessWidget {
  const MyApp({super.key});

  // This widget is the root of your application.
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo 2',
      theme: ThemeData(
        colorScheme: ColorScheme.fromSeed(seedColor: Colors.deepPurple),
        useMaterial3: true,
      ),
      home: const MyHomePage(),
    );
  }
}

We will add the BlocProvider instance in the home: selector:

      home: BlocProvider(
        create: (context) => C64Bloc(),
        child: const MyHomePage(title: 'Flutter Demo Home Page'),
      )
So, in the create selector we create an instance of our Bloc which will be passed down our widget tree.

Th original instance of MyHomePage now moved down to the home selector.

Now somewhere within our MyHomePage class, we need to inject the created BloC via BlocBuilder.

However, let do first things first. IntelliJ previously created our MyHomePage as a stateful widget, but we want a stateless widget. This is easy enough and we change our class to extend from StatelessWidget instead of StatefulWidget.

Having made the change, The IDE complains that MyHomeClass now needs a build method. So, we let the IDE implement this method. With all this our MyHomePage class implifies to the following:

class MyHomePage extends StatelessWidget {
  const MyHomePage({super.key});

  @override
  Widget build(BuildContext context) {
    // TODO: implement build
    throw UnimplementedError();
  }
}
The IDE has created a TODO for us.  Let us implement it straight by making use of a BlockBuilder:

class MyHomePage extends StatelessWidget {
  const MyHomePage({super.key});

  @override
  Widget build(BuildContext context) {
    return BlocBuilder<C64Bloc, C64State>(
      builder: (BuildContext context, state) {
        return Text('Hi There!!');
      },
    );
  }
}
And everything renders in the browser:
Our text is weirdly formatted in red and underlined because we did not surround it with a proper widget for applying styling. This will be addressed at a later stage.

As mentioned earlier when we create an instance of C64Bloc, an initial state is passed to its Super method. It is this initial state that triggers the drawing of the text when the app starts up. 

It would actually be nice to show something of our initial state in our widget, just to get a feel that everything is working end to end.

For this purpose, let us add an epoch to our C64State class:

class C64State extends Equatable {
  final time = DateTime.now().millisecondsSinceEpoch;
...
}
We can now adjust our MyHomePage class as follows:

class MyHomePage extends StatelessWidget {
  const MyHomePage({super.key});

  @override
  Widget build(BuildContext context) {
    return BlocBuilder<C64Bloc, C64State>(
      builder: (BuildContext context, state) {
        return Text(state.time.toString());
      },
    );
  }
}

And the rendered page look like this:

Our Epoch gets displayed!!

Using Events

So, in the previous section we basically just shown a timestamp once off, when the app started up. But what if we wanted to update the timestamp on the screen when you press a button?

This is where events come in. When you click a button, an event will be fired which our BloC class will listen for and update the state.

Before we can do this, we need to wrap our BlocBuilder into a more meaningful container, which will allow us to easily add a button. For this we wrap our BlocBuilder into a Scaffold, which will move our BlocBuilder to the body attribute:

  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: const Text('BLoC Example')),
      body: BlocBuilder<C64Bloc, C64State>(
        builder: (BuildContext context, state) {
          return Text(state.time.toString());
        },
      ),
        floatingActionButton: FloatingActionButton(
          tooltip: 'Step',
          onPressed: () {  },
          child: const Icon(Icons.arrow_forward),
        )
    );
  }

With all this we get an appBar and button on the lower right of the screen:


Now, we need to fire an event when the button is pressed:

...        
        onPressed: () {
          context.read<C64Bloc>().add(C64Event());
        },
...
This is in effect the shorthand for getting our injected C64Bloc instance and adding an event to it when we press the button.

Now, with all our changes, lets do a test run. Upon startup we will see an initial timestamp. However, as we click the button, nothing happens. What is going on here?

Looking at the console, we see an exception is thrown every time, and looking closer, we see the error is thrown in our C64State class as highlighted here:

class C64State extends Equatable {
...
  @override
  // TODO: implement props
  List<Object?> get props => throw UnimplementedError();
}

Now this was part of the original code generated when I asked the IDE to implement the missing methods for me. Clearly, the IDE left me a TODO, which I ignored, but also purposefully ignored to illustrate a point 😁.

Lets unpack this a bit more. For starters, for what is the get props for? This is a method that need to be implemented when you extends Equatable.

Why do we need Equatable? This makes it easy to compare objects against its other. This is a common problem in many Object Oriented languages. How do we know if objects are equal? You can compare references, but that is a useless excercise since you will always different objects and it will not tell you if their values are equal.

Equatable comes to the rescue here, but you need to do something from your side. For get props you need to specify a list of property of your class that needs to be considered for comparison. In our case, we have just one property to add, time:

import 'package:equatable/equatable.dart';

class C64State extends Equatable {
  final int time = DateTime.now().millisecondsSinceEpoch;

  @override
  List<Object?> get props => [time];
}
If we now recompile, we see everything works as expected, and with any click, the timestamp updates.

I question might arise at this point: Why is a proper get props necessary for C64State, but not for C64Event? The answer is that it is always important for Flutter to know if a state has changed in order to trigger a redraw of your widget. This is where proper object comparison comes in.

In Summary

In post we continued our journey in creating a C64 emulator in Flutter. We explored basic concepts of Stateless widgets, state and events. These are all building blocks for working towards our emulator.

In the next post we will be implementing a couple of CPU instructions, and single step through a simple program, watching a snippet of memory and registers as we go along.

Until next time!

Saturday, 9 November 2024

A Commodore 64 Emulator in Flutter: Part 1

Foreword

It has been a quite a while since I last publish a post in my blog.

It was quite an eventful couple of years. I was fortunate to work fully remote during lock down and even a couple of years thereafter.

However, since last year they gradually asked us to come back to office more often until we found ourselves back at office full time. Suddenly I found myself wasting more time in traffic.

It was clear for me the whole concept of working from home is gone, so, with that in mind, I find a job closer to home with more opportunities.

One of the opportunities I got at my new job was to learn a new cross platform UI framework, called Flutter.

My colleagues at my job is quite a bit younger than me, and with that always comes an atmosphere of more enthusiasm. They mentioned that they read articles that you can even write games in Flutter. Something else that also picked my interest about Flutter is that you can also run it in a browser, but it doesn't make use of the DOM which HTML uses for rendering. Instead, it renders everything to a canvas and make use of WebGL as well as WebAssembly.

So, sounding like Flutter having a lot of power under the bonnet, I ask myself:"Why not write another Commodore C64 emulator in Flutter?" That is what I will be exploring in the next couple of blog posts.

In this post I will be covering some of the basics of Flutter. I coming series I will just build on the idea and add more functionality.

You maybe wondering what my plans are for my Amiga on an FPGA project. I have decided to put this project on hold for a bit, just to take break, and just try something new, like this Flutter project.

Hope you enjoy this series!

Installing Flutter

As mentioned, Flutter is a cross platform framework. You can create apps for Android, IOS, Windows, Linux and for web. In my series, however, I will mainly focus on web on a Linux machine.

The following link explains how to install flutter on your development machine:

https://docs.flutter.dev/get-started/install/linux/web

In short, you basically need to download the sdk, extract it to a path, and then add a location to your $PATH environment variable so that the you can run the flutter executable from the command line.

You obviously will also need to setup your IDE to develop in Flutter. In my case, I am using IntelliJ Ultimate and I just installed the Flutter Plugin, so I can easily run and debug Flutter applications.

Creating your first Flutter app

Let us create our first Flutter app using IntelliJ.

With IntelliJ open select File/New Project and Select Flutter from the generators:



Also, at the top ensure that you have selected the path to your Flutter SDK.

Now click next and specify a name and a path for you new project:


At this point, we should maybe just talk a bit about naming conventions. All file names are in lower case, with words separated by underscores.

With classes inside the files, you make use of Camel case for naming.

Now, if you click create, your new project will be created. Your project structure will something like the following:




In general, the only thing you need to worry about is the lib folder, where you put all your source code and pubspec.yaml, where you put important settings of your project.

If you look at the top, you will see a drop down displaying "no device selected". Click this drop down and select Chrome.

Your toolbar will now look like this:

Now, click the play button. You project will be build, and eventually a Chrome Browser window will open with your app:


Flutter have created a default app counting how many times you have pressed the plus button.

Unpacking the generated code

Let us unpack the generated code a bit. Looking at the code, we have everything generated in a single file called main.dart. You are not bound to this and you can put your stuff into multiple files.

First thing we see in main.dart, is a main method:

  void main() {
    runApp(const MyApp());
  }
We create an instance of MyApp and then run it.

Let us look at the definition of MyApp:

class MyApp extends StatelessWidget {
  const MyApp({super.key});

  // This widget is the root of your application.
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        colorScheme: ColorScheme.fromSeed(seedColor: Colors.deepPurple),
        useMaterial3: true,
      ),
      home: const MyHomePage(title: 'Flutter Demo Home Page'),
    );
  }
}

To save space, I have removed most of the comments.

Now, we see MyApp extends StatelessWidget. I will cover in a moment on what a StatelessWidget is, but for now just think of it as some kind of widget.

In Flutter, Everything you display on the screen is a Widget. One of the things a widget always have is a build method, which, you have guest it, returns a Widget.

MyApp is our main widget, which in this case is an instance of MaterialApp. MaterialApp does the heavy lifting, applying the appropriate styling to all visual components, to give your App the look and feel of a Material application.

Within the MaterialApp instance, there is a property home, where you specify your homepage widget. This is basically the entry page to your application. Once in your homepage, you can also navigate to other pages as well, almost like a web application.

Next, let us have a look at MyHomePage:

class MyHomePage extends StatefulWidget {
  const MyHomePage({super.key, required this.title});

  final String title;

  @override
  State<MyHomePage> createState() => _MyHomePageState();
}

Firstly, look at the interesting way the constructor works. You will always specify the parameters in curly braces. Also, you use the word required, if it is compulsory to specify the given parameter.

At this point you will notice that MyApp is a StatelessWidget and MyHomePage is a StatefulWidget. These are two very important concepts in Flutter.

A stateless widget is immutable. In order to change what it displays, you need to destroy the instance and create a new one.

A statefull widget on the other hand, can store state, and if you need to change what it displays, you can keep the same instance.

Having said all this, our demo created app might sound confusing that we have a stateful widget inside a stateless widget. Does this mean that if our HomePage change that our Myapp instance will be destroyed and a new one created? Not at all. Remember, although MyHomePage lives inside Myapp, it is still self contained. So, it can do display updates within its own area on the screen, and without effecting the parent.

If there is actually display elements outside MyHomePage, inside the parent, it would not be possible to redraw any changes. To make changes to such elements, one would need to create a new instance of the Stateless widget, which in this case would mean bringing down the application, which we don't want 😁

Stateful widget are actually very smart, and usually just redraw parts of the screen that have changed, thus avoiding redrawing the whole screen. Having said that, despite the cleverness of Stateful Widgets, Flutter textbooks actually warns against using them and advise to rather use Stateless widgets where possible.

The reason authors provide for avoiding Stateful Widgets, is because code for managing the screen and business logic can easily get mixed up together. Separation of concerns is actually key here.

In this series of creating a Commodore C64 emulator in Flutter, I will take these authors advice to heart and stick with Stateless beans.

In the next section I will describe what my approach will be for the rest of the series.

The approach

Let us discuss the approach I am going to take for creating a C64 emulator in Flutter.

As mentioned earlier, I am going to stick with Stateless widgets. As the name imply the widget itself is not going to have state. So, one will need to keep the state outside the widget.

There are quite few patterns in flutter that can provide this functionality of providing state outside a stateless widget. I will be using the BLoC pattern, which is short for Business Logic Component.

Let us use our C64 Emulator as an example to explain the concept of a BLoC. The following diagram sums up everything:

It is still early days for our flutter Emulator, so one cannot expect any fancy stuff at his point 😂 At this point we will incrementally add more CPU instructions to our emulator, and single step through 6502 machine code, and seeing the values of the registers and a small section of memory.

Clicking on the Step button on the widget will trigger an event for which the BloC will listen for. Upon receiving the event, the Block will execute on CPU instruction which will potentially effect registers in the CPU and locations in memory.

When the BLoC has executed one CPU instruction, it will send a state update to the widget, which in effect will destroy the widget and create a new one with the updated values. The state will contain the values of the registers and a snippet of the updated memory. The widget in turn will display this to the user.

In Summary

In this post I gave an introduction to my idea of creating a C64 emulator in Flutter. I will start off using a Stateless Widget with a BLoC.

Initially my Emulator will just be single stepping through CPU instructions, showing the values of the registers and memory after each step. Once we have implemented all 6502 CPU instructions, we will move onto more interesting visual elements of a C64 emulator.

Until next time!

Friday, 5 January 2024

Extending our Hypothetical Amiga core

Foreword

In the previous post we added some more functionality to issue some more sensible commands to the second channel of our memory controller.

We also created a Hypothetical Amiga core for serving our second memory channel with the sensible read/write commands. This Hypothetical Amiga core serves as a Work in progress, which we will continue to add more and more functionality in coming posts.

In this post we will hook up the Minimig (aka Mini Amiga) core to our almost mostly empty Amiga core.

We briefly played with the Minimig core a couple of posts ago, just to get a feel of how it works.  In particular, one of the feature we briefly looked at of the Amiga, is the Memory overlay scheme at bootup of Amiga, where the RAM starting at address 0, is disabled and instead a piece of ROM is mapped instead at this range of memory.

However, when we previous played with the Minimig, we barely touched on all the technicalities when reading from memory via the Minimig. In this post we will explore some of these technicalities.

DACT Battles

With discussions we had from previous posts, we know that in order for a 68k Motorola processor to access memory, two signals are of importance: ASn and DACT. The CPU asserts the signal ASn to signal memory it wants to access memory. When the memory is ready with the data it signals the CPU in return by asserting the DACT signal.

While playing with the Minimig core, I discovered something that puzzled me a bit from the following waveform:


In this waveform, the Amiga core is still in overlay mode, meaning that the address 0 the Amiga would translate to the address 3c0000 hex. The Amiga core, however, asserts the DACT signal, e.g. second signal in the waveform before the address changes to 3c0000. This means that potentially the data will be be read from the incorrect address and returned to the CPU.

I was wondering how the Minimig core on the MisTer project deals with this scenario. Delving a bit into the source code, I found the following snippet in the file rtl/cpu_wrapper.v:


...
fx68k cpu_inst_o
(
...
	.DTACKn(ramsel ? ~ramready : chip_dtack),
...
);
...
Here we see that for RAM accesses we don't use the dact signal of our Minimig core, but rather the ramready signal.

So, what does the ramready signal entails? The answer lies in the following snippet of code within the file rtl/sdram_ctrl.v:

...
cpu_cache_new cpu_cache
(
	.clk              (sysclk),                // clock
	.rst              (!reset || !cache_rst),  // cache reset
	.cpu_cache_ctrl   (cpu_cache_ctrl),        // CPU cache control
	.cache_inhibit    (cache_inhibit),         // cache inhibit
	.cpu_cs           (ramsel),                // cpu activity
	.cpu_adr          (cpuAddr),               // cpu address
	.cpu_bs           ({!cpuU, !cpuL}),        // cpu byte selects
	.cpu_we           (cpustate == 3),         // cpu write
	.cpu_ir           (cpustate == 0),         // cpu instruction read
	.cpu_dr           (cpustate == 2),         // cpu data read
	.cpu_dat_w        (cpuWR),                 // cpu write data
	.cpu_dat_r        (cpuRD),                 // cpu read data
	.cpu_ack          (cache_rd_ack),          // cpu acknowledge
	.wb_en            (cache_wr_ack),          // write enable
	.sdr_dat_r        (sdata_reg),             // sdram read data
	.sdr_read_req     (cache_req),             // sdram read request from cache
	.sdr_read_ack     (cache_fill),            // sdram read acknowledge to cache
	.snoop_act        (chipWE),                // snoop act (write only - just update existing data in cache)
	.snoop_adr        (chipAddr),              // snoop address
	.snoop_dat_w      (chipWR),                // snoop write data
	.snoop_bs         ({!chipU, !chipL})       // snoop byte selects
);
...
assign ramready = cache_rd_ack || write_ena;
...
Here we see that the Amiga core used in the MisTer project, doesn't read directly from SDRAM, but rather via an cache. Looking at the implementation of cpu_cache_new, we see that it is quite an advanced cache, with the same kind of functionality you would find with a cache dedicated on modern day CPU's. This cache will even snoop data writes peripheral chips have done to memory via DMA, so that the CPU will not miss out on these updates.

Overall, this cache is 8KB in size. Overall, I am not so sure if I will be using a cache in my implementation as well. The MisTer project is meant for the DE10-Nano, which has over 500KB of block RAM. I am using the Arty A7, which has far less Block RAM, so I am not sure if I would be able to compete with the same capabilities than what the DE10-Nano has.

So, for now I will work with an implementation that doesn't have a cache just to ease the use of BlockRAM.

More clocking

Now, from the previous post you will remember that we are clocking our amiga_mem_core module, with a clock signal called clk_8_2_mhz. This clock signal only triggers once every 10 clock cycles of our mclk of 83.3333MHz, which resolves to a frequency of 8.3333MHz.

Well, it turns out that that our MiniMig design wants a clock signal of more or less 28MHz. So, instead of one clock cycle every tenth clock cycle, we will need to enable more clock cycles for every 10 clock cycles.

We can maybe go about enable every second clock cycle, which will give us 41MHz. Maybe this is a bit too fast. We can maybe opt for every 10 clock cycles, we can only enable 4 of them, giving us a frequency of 33MHz. Think this is the closest we can get to 28MHz by just enabling different clock cycles within every 10 of them.

So, let us fiddle a bit with how clk_8_2_mhz is generated:

...
    BUFGCE BUFGCE_8_2_mhz (
       .O(clk_8_2_mhz),   // 1-bit output: Clock output
       .CE(clk_8_2_enable), // 1-bit input: Clock enable input for I0
       .I(mclk)    // 1-bit input: Primary clock
    );
...
    always @(negedge mclk)
    begin
        clk_8_2_enable <= (edge_count == 7 || edge_count == 5 || edge_count == 3);
    end
...
Having fed our amiga_mem_core module with the correct clock frequency, we need to instantiate a module instance within this module for generating the other clock signals the minimig core requires:

   amiga_clk amiga_clk
        (
          .clk_28(clk),     // 28MHz output clock ( 28.375160MHz)
          .clk7_en(clk7_en),    // 7MHz output clock enable (on 28MHz clock domain)
          .clk7n_en(clk7n_en),   // 7MHz negedge output clock enable (on 28MHz clock domain)
          .c1(c1),         // clk28m clock domain signal synchronous with clk signal
          .c3(c3),         // clk28m clock domain signal synchronous with clk signal delayed by 90 degrees
          .cck(cck),        // colour clock output (3.54 MHz)
          .eclk(eclk),       // 0.709379 MHz clock enable output (clk domain pulse)
          .reset_n(~(reset))
        );
This is also a module I used straight from the minimig project, which is the file rtl/amiga_clk.v.

For completeness sake, let us add the other module instances:

...
   always @(negedge clk)
    begin
      phi <= ~phi;
    end
...
   minimig minimig(     //m68k pins
     .cpu_address(add), // m68k address bus
     .cpu_data(data_in),    // m68k data bus
     .cpudata_in(data_out),  // m68k data in
     ._cpu_ipl(interrupts),    // m68k interrupt request
     ._cpu_as(As),     // m68k address strobe
     ._cpu_uds(Uds),    // m68k upper data strobe
     .button_reset(reset),
     ._cpu_lds(Lds),    // m68k lower data strobe
     .cpu_r_w(read_write),     // m68k read / write
     ._cpu_dtack(data_ack),  // m68k data acknowledge
     ._cpu_reset(/*reset*/),  // m68k reset
     ._cpu_reset_in(reset_cpu_out),//m68k reset in
     .nmi_addr(0),    // m68k NMI address
     //TODO
     //sram pins
     .ram_data(data),    // sram data bus
     .ramdata_in(ram_data_in),  // sram data bus in
     .ram_address(address), // sram address bus
     ._ram_bhe(),    // sram upper byte select
     ._ram_ble(),    // sram lower byte select
     ._ram_we(write),     // sram write enable
     ._ram_oe(oe),     // sram output enable
     .chip48(),      // big chipram read
 
     //system    pins
     .rst_ext(),     // reset from ctrl block
     .rst_out(),     // minimig reset status
     .clk(clk),         // 28.37516 MHz clock
     .clk7_en(clk7_en),     // 7MHz clock enable
     .clk7n_en(clk7n_en),    // 7MHz negedge clock enable
     .c1(c1),          // clock enable signal
     .c3(c3),          // clock enable signal
     .cck(cck),         // colour clock enable
     .eclk(eclk),        // ECLK enable (1/10th of CLK)
 
     //rs232 pins
     .rxd(),         // rs232 receive
     .txd(),         // rs232 send
     .cts(),         // rs232 clear to send
     .rts(),         // rs232 request to send
     .dtr(),         // rs232 Data Terminal Ready
     .dsr(),         // rs232 Data Set Ready
     .cd(),          // rs232 Carrier Detect
     .ri(),          // rs232 Ring Indicator
 
 
     //host controller interface (SPI)
     .IO_UIO(),
     .IO_FPGA(),
     .IO_STROBE(),
     .IO_WAIT(),
     .IO_DIN(),
     .IO_DOUT()
 

 
     //user i/o
     //output  [1:0] cpucfg,
     //output  [2:0] cachecfg,
     //output  [6:0] memcfg,
     //output        bootrom,     // enable bootrom magic in gary.v
);
...
   fx68k fx68k(        .clk(clk),
        .HALTn(1),                    // Used for single step only. Force high if not used
        // input logic HALTn = 1'b1,            // Not all tools support default port values
        
        // These two signals don't need to be registered. They are not async reset.
        .extReset(reset),            // External sync reset on emulated system
        .pwrUp(reset),            // Asserted together with reset on emulated system coldstart    
        .enPhi1(phi), .enPhi2(~phi),    // Clock enables. Next cycle is PHI1 or PHI2
        .eRWn(read_write),
        .oRESETn(reset_cpu_out),
        //output eRWn, output ASn, output LDSn, output UDSn,
        //output logic E, output VMAn,    
        //output FC0, output FC1, output FC2,
        //output BGn,
        //output oRESETn, output oHALTEDn,
        .ASn(As), 
        .LDSn(Lds), 
        .UDSn(Uds),
        .DTACKn(data_ack), 
        .VPAn(1),
        .BERRn(1),
        .BRn(1), .BGACKn(1),
        .IPL0n(interrupts[0]), 
        .IPL1n(interrupts[1]), 
        .IPL2n(interrupts[2]),
        .iEdb(data_in),
        .oEdb(data_out),
        .eab(add)
);
...
I have described the use of both modules in a previous post. You will also noticed that the DTACKn still uses the data_ack signal blindly from minimig, of which I have warned against in the previous section. We will give attention to this in the next section.

Synchronised Memory Access

As mentioned earlier, one is not really guaranteed when the minimig core asserts the dact signal, that the data will be available for the CPU. So, one needs to delay the dact signal somehow until the data is ready.

One signal we can use for this is is the _ram_oe signal from the minimig core. Once this signal is asserted, we can be sure that the address asserted is correct and we can fetch the correct data. Obviously we will only assert the DACT signal when the data is really ready.

We will implement all this logic with the following state machine:

    always @(posedge clk)
    begin
       case(dact_state)
         STATE_IDLE: begin
                       dact_state <= (!oe && !data_ack) ? STATE_OE : STATE_IDLE;
                     end
         STATE_OE: begin
                       dact_state <= STATE_OE_1;
                     end
         STATE_OE_1: begin
                       dact_state <= STATE_DACT;
                     end
         STATE_DACT: begin
                       if (data_ack)
                       begin
                           dact_state <= STATE_IDLE;
                       end
                     end

       endcase
    end
We transition from the IDLE state to the next state when both oe and data_ack is asserted, just to ensure we act on a CPU memory access and not from a peripheral DMA access.

For the purpose of just testing, I have added two states to simulate a two cycle memory access time. When we will eventually use our real memory controller, more clock cycles will apply.

We are now ready to supply our CPU with a true DACT signal:

   fx68k fx68k(        .clk(clk),
...
      .DTACKn(dact_state == STATE_DACT ? data_ack : 1), 
...
);
What remains to be done is to link up the memory. For now a will just simulate hardcoded values for memory, that will be a program and see if our CPU will act accordingly:

...
   always @(posedge clk)
   begin
       oe_delayed <= oe;
   end
...   
   assign trigger_read = oe_delayed && !oe;
...   
    always @(negedge clk_8_2_mhz)
        begin
            if (amiga_test_address == 22'h3c0000 && trigger_read)
            begin
              data_in_amiga_test <= 16'hc0;
            end else if (amiga_test_address == 22'h3c0001 && trigger_read)
            begin
              data_in_amiga_test <= 16'h33c3;
            end else if (amiga_test_address == 22'h3c0002 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0;
            end else if (amiga_test_address == 22'h3c0003 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0008;
            end else if (amiga_test_address == 22'h3c0004 && trigger_read)
            begin
              data_in_amiga_test <= 16'h303c; //load immediate
            end else if (amiga_test_address == 22'h3c0005 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0505;
            end else if (amiga_test_address == 22'h3c0006 && trigger_read)
            begin
              data_in_amiga_test <= 16'h33c0; //store
            end else if (amiga_test_address == 22'h3c0007 && trigger_read)
            begin
              data_in_amiga_test <= 16'h0085;
            end else if (amiga_test_address == 22'h3c0008 && trigger_read)
            begin
              data_in_amiga_test <= 16'h8586;
            end else if(amiga_test_address == 22'h3c0009 && trigger_read)
            begin
              data_in_amiga_test <= 16'h4eb9;
            end else if (amiga_test_address == 22'h3c000a && trigger_read)
            begin
              data_in_amiga_test <= 9;
            end else if (amiga_test_address == 22'h3c000b && trigger_read)
            begin
              data_in_amiga_test <= 16'h3e86;
            end
            else if (trigger_read) begin
              data_in_amiga_test <= 16'h33c0;
            end
        end
...
The data only gets assigned once the oe signal transitions from 1 to a 0. As we know, when the 68k processor starts to execute it starts by loading the vectors at address 0, which the minimig core translates to 3c0000.

As you might remember, the starting address that indicates to the 68k where to starts executing is indicated by the vector starting at byte address 4, or 16 bit-word address 2. From the code above this translates to byte address 8, or word address 4. For this reason the program actually starts at word address 3c0004.

Let us have a look at how the waveform looks like, captured from the real FPGA:


This third line is the address send to RAM for retrieving data. It resolves to the valid address starting with '3c' at the falling edge of cck. The second last row show the resulting data asserted shortly thereafter.

In Summary

In this post I added some more logic to our amiga_mem_core to resemble more an Amiga.

We also tested to see if we our 68k core could reliably fetch data and execute code.

In the next post we will try and link up our Amiga controller to our SDRAM memory controller.

Until next time!

Monday, 23 October 2023

Adding more functionality to the second channel of the Memory controller

Foreword

In the previous post we started modifying our existing memory controller to become a dual channel memory controller.

A dual core memory controller would allow us to have two cores accessing memory at both 7MHz, by allocating a different bank of memory within the DDR3 memory for each core.

In the previous post we basically got the timings right to trigger the DDR3 commands of the cores in an interleaved way.

In this post we are going to extend this functionality further and add a core to issue some dummy read/write commands on the second memory channel and see if we can read some sensible data back from DDR RAM via the second memory channel.

Using sensible addresses

In the previous post we didn't really worry about using sensible row/column addresses for the second channel of our memory controller and we just used the same hardcoded address for both the row and the column.

So, let us start this post by seeing if we can create some sensible row and column addresses. Firstly, we will create a block of code for driving our second memory channel:

amiga_mem_core amiga_mem_core(.clk(clk_8_2_mhz),
    .address(channel_address_2),
    .data(channel_data_2),
    .data_in(cap_value_2),
    .write(write_channel_2),
    .reset(reset_retro)
);
amiga_mem_core is our hypothetical Amiga core that will use the second memory channel for its memory needs. We will gradually develop this core in coming sections and future posts.

Let us quickly discuss the different ports of amiga_mem_core:
  • clk_8_2_mhz: This is basically the same kind of clock as what drives our main 6502 core. This is the 83.333Mhz clock, but we only present every tenth clock pulse, which gives us an effective clock of 8.333Mhz. I would like to point out here that we will use a different clock pulse from 10 available than we use for our 6502 core, because the second memory channel require the address to be asserted at a different time than the first memory channel.
  • channel_address_2: a 16 bit linear address, giving 64k address space. We will slice and dice this address to get row address and column address
  • cap_value_2: 16 bit captured data from DDR3 RAM. As we know from previous posts, the ISERDES captures this data from DDR3 RAM, but throws it away after the next 83MHz. So, we need to capture this data so it is still available at the next 8.33MHz clock pulse.
  • write_channel_2: The Amiga core indicates whether it wants to either write (e.g. set to 1), or read (e.g. set to 0).
Let us modify our memory controller state machine a bit to use the values from these ports:

              WAIT_READ_WRITE_2: begin
                  test_cmd <= 32'h000001ff;
                  phy_rcw_pos_2 <= 3;
                  phy_address_2 <= {9'b0,channel_address_2[15:10]};
				  state <= PRECHARGE_AFTER_WRITE;
              end
			  
	      PRECHARGE_AFTER_WRITE: begin
                  // CAS command
                  phy_rcw_pos_2 <= {2'b10, write_channel_2};
                  phy_address_2 <= {5'b0,channel_address_2[9:3], map_address_2[2:0]};
                  data_in <= {8{channel_data_2}};
                  dq_tri = write_channel_2 ? 15 : 0;
                  mem_channel <= 1;
                  state <= POST_READ_1;
                  cmd_slot <= 3;
                  test_cmd <= write_channel_2 ? 32'h000029fd : 32'h00002dfd;
              end
If you have a look at my previous post, you will see I have also modified the above two selectors of the state machine to open a row for the second memory channel and then do a column read/write in the second selector. In this case I have added some more logic to use the address of our Hypothetical Amiga core.

Note that as with our first channel we form the row address by using bits 10 upwards from our Amiga core, and the lower ten bits of the Amiga core address.

You will notice I am not using the lower three bits as is for the column address, but rather make use of a map. I have used the same technique in the first channel of our memory controller. Let us quickly recap on the reason for this.

As you might remember from previous posts, DDR3 memory will never just you the single 16 bit- word you are looking for, but will always return you a burst of 4 or 8 words. To catch the data in the correct chunk within the 8 word burst, is quite challenge and you need to fiddle quite a bit the code to get it right.

So, I just take the lazy route and just see what word arrives for each address 0f 0-7 and then just created a map to get the correct word within the burst. My mapping function looks like this:

    always @*
    begin
        if (channel_address_2[2:0] == 0)
        begin
            map_address_2 = 7;
        end else if (channel_address_2[2:0] == 1)
        begin
            map_address_2 = 0;
        end else if (channel_address_2[2:0] == 2)
        begin
            map_address_2 = 1;
        end else if (channel_address_2[2:0] == 3)
        begin
            map_address_2 = 2;
        end else if (channel_address_2[2:0] == 4)
        begin
            map_address_2 = 3;
        end else if (channel_address_2[2:0] == 5)
        begin
            map_address_2 = 4;
        end else if (channel_address_2[2:0] == 6)
        begin
            map_address_2 = 5;
        end else
        begin
            map_address_2 = 6;
        end
    end
Also, there is a different mapping function for both the simulation environment and when running on the actual FPGA. I never managed to find the reason why there is a difference between the two, but for now I am just using two different mapping functions for the two environments.

Moving onto the data_in assignment. Here I am just repeating the data I want to write for the full burst, until the write is complete. It is important in this case just to ensure we assert the Data mask bit it the correct time instant to ensure the correct word is written in a 8-word column. So, I am just doing another mapping function:

    always @*
    begin
        if (cmd_offset[2:0] == 0) 
        begin
            dm_slot = ~1;
        end else if (cmd_offset[2:0] == 1)
        begin
            dm_slot = ~2;
        end else if (cmd_offset[2:0] == 2)
        begin
            dm_slot = ~4;
        end else if (cmd_offset[2:0] == 3)
        begin
            dm_slot = ~8;
        end else if (cmd_offset[2:0] == 4)
        begin
            dm_slot = ~16;
        end else if (cmd_offset[2:0] == 5)
        begin
            dm_slot = ~32;
        end else if (cmd_offset[2:0] == 6)
        begin
            dm_slot = ~64;
        end else if (cmd_offset[2:0] == 7)
        begin
            dm_slot = ~128;
        end
    end
The wire cmd_offset is used for both channels, so it is important we have a selector like this:

    assign cmd_offset = mem_channel == 0 ? cmd_address[2:0] : channel_address_2[2:0];

Implementing the Hypothetical Amiga core

Let us implement the Hypothetical Amiga core we had been talking about in this post. This is basically the core where we will do some writes using the second memory channel and see if we can read the same data back. In future posts we will gradually evolve this core to a fully functional Amiga core.

This core will basically be a 6 bit counter, where we use the top bit to indicate read/write, low indicating write. So, starting the top bit as zero, we will start doing a bunch of writes, and when the counter comes to the point where bit 5 (e.g. top bit) is set, we will do a series of reads.

The resulting core is fairly simple:

module amiga_mem_core(
    input wire clk,
    output wire [15:0] address,
    output wire write,
    input wire reset,
    output wire [15:0] data,
    input wire [15:0] data_in
    );
    
   (* mark_debug = "true" *) reg [5:0] counter = 0;
   (* mark_debug = "true" *) reg [15:0] captured_data;
   
   assign address = {11'b0, counter[4:0]};
   assign write = counter[5];
    
   always @(posedge clk)
   begin
       counter <= reset ? 0 : (counter + 1);
   end
   
   always @(posedge clk)
   begin
       captured_data <= data_in;
   end
   
   assign data = counter + 3;
endmodule
I have marked counter and captured_data to be debugged, so we can view those ports via ILA when running on the actual FPGA.

We use the counter also to generate some test data and add three to it does to get some test data that is different from the address.

I mentioned earlier that the data ISERDES capture is only retained for one 83.33Mhz clock cycle, so by the time our Amiga core looks for the data, it will be long time gone. So, we will need to capture it outside the Amiga core and feed it to the Amiga core like this:

    always @(posedge mclk)
    begin
        if (edge_count == 7)
        begin
            cap_value_2 <= {data_out[103:96], data_out[39:32]};
        end
    end
So, we capture the data always at specific 83Mhz when the data is available. data_out is basically the the output of our ISERDES block, that captured 8 bursts of data. Bits 63 - 0 contains the low byte of each of the 8 data bursts, and bits 127 - 64 contains the high byte of each of the 8 bursts. By experimentation I found that the data we need is always at bits 39:32 and bits 103:96.

In Summary

In this post we added some more meat around the second channel of our memory controller, managed to write some test data to the DDR3 RAM and read the same data back.

In the next post we will start to do some more interesting stuff, and see if we can add an Amiga core that uses the second memory channel for memory storage.

Until next time!

Thursday, 28 September 2023

New beginnings of a dual channel DDR3 memory controller

Foreword

In the previous post we managed to get a 6502 based ecosystem together where we could access both an SD Card and DDR3 memory.

With this design we can load quite a lot of stuff from SD Card into DDR3 memory and thus reduce our dependency on limit Block RAM that is available on the FPGA. This opens the possibility to emulate an Amiga core on the Arty A7 FPGA board.

As mentioned in previous posts, we will be using an 6502 based system that will do all the work of loading all the required stuff from SD Card to DDR3, which the Amiga core requires to work. Needless to say, this would require both the 6502 core and Amiga core to access the DDR3 memory.

One way to address the need of both 6502 + Amiga core to access the DDR3, would be to use the memory controller we developed in the last couple of posts, and just let the two cores make turns to access DDR3 memory. Knowing that our memory controller runs at around 8MHz, that would mean that our Amiga core would be running accessing DDR3 memory at around 4MHZ, because it would be accessing memory at every second clock cycle. This is far from ideal with a stock Amiga running at least at 7MHz.

So, in this post we will try and come up with an optimised dual channel memory controller where we will attempt to make both the Amiga core and 6502 core access memory at 7MHz.

The Magic of Memory Banks

In our journey with DDR3 memory, we got the know the different states memory can be in:

  • Activate: Activate a row for reading or writing
  • Read/Write: Read or write a particular column of data
  • Precharge: After you are finished with your reads/writes on a particular row, you first need to precharge the row, before moving on to the next one.
All the above mentioned takes time to complete. In my Arty A7 scenario, each of these states takes about 5 memory clock cycles to complete.

Once you have an open row, however, consecutive memory reads from the same row can be quite fast, provided you give the column addresses ahead of time.

Things, however, will not work out so well for our plan where a 6502 and Amiga core need to access memory. The Amiga core, for instance, might need to access data from a different row than what the 6502 is currently busy with. In such case the Amiga core needs to wait for 6502 core to finish its business with the current row it is busy with, before the Amiga core can open the row it wants. This will again bring us to the point where the Amiga core can only access the memory at half the available memory bandwidth, e.g. 4MHz.

However, all hope is not lost. DDR3 memory divides memory into different memory banks and each memory bank can have a row open independently from the over banks. The DDR3 memory chip on the Arty A7 have 8 memory banks. This means that each memory bank is 256MB/8 = 32MB.

So, the basic idea is to give both the 6502 core and the Amiga core its own bank, then theoretically every core can get the full memory bandwidth of 8MHz. One just need to carefully schedule the timing of when to issue DDR3 commands, so that these cores don't trip over each over. DDR3 RAM, for instance, still have only one data bus, so if you issue read commands from two banks, you can't expect the data to arrive at the same time. It will first output the data from the first bank, and thereafter the data from the other bank.

Coming back to the size of every memory bank. 32MB per bank is more than enough of what we want to do. For the Amiga core this will be more than enough for the ROMS and the amount of RAM you will get for your earlier Amigas. For the 6502 core this will also be more than enough to store a disk image and simulate a disk read from the Amiga.

Using timeslots wisely

The memory on the Arty A7 clocks at 333Mhz, which is far beyond the speed capability of the FPGA on the Arty A7. As we learned from previous posts, the designers of the FPGA, provided a way out by providing OSERDES blocks, for serialising out data. The OSERDES blocks themselves can serialise the data out at 333MHz. We need to provide the data 4 chunks at a time to this block, which reduces the required speed from the rest of the FPGA to 83MHz, which is more manageable.

Now, in our current design, for every 4 timeslots, we can at most issue only one DDR command. We have a choice where this command can happen, but at most only one command within 4 timeslots.

With our plan to interleave DDR commands for a 6502 core and an Amiga core, issuing 1 command per 4 cycles, is perhaps too tight. After thinking of this for a while, I came thought of having two commands per 4 timeslots. I want to reserve the first two timeslots for th3 6502 core, and the last 2 slots for the Amiga core.

To see how we are going to change our design to cater for this, let us revise how the current design works. The following is a snippet of one of the selectors in our state machine for our memory controller:

              PREPARE_CMD: begin
                  test_cmd <= 32'h000001ff;
                  cmd_slot <= 0;
                  if (edge_count == 8)
                  begin
                      state <= COL_CMD;
                      test_cmd <= {1'b0, 8'b0, cmd_address[15:10], 1'b0, 16'h21fd};
                  end
              end
test_cmd is the command we want to issue. I am not going to explain the individual bits for this, but it basically indicates what RAS/CAS/WRITE should be set as for the command. cmd_slot indicates at which of the 4 time slots the command should be issued.

The bits of these two registers goes down a number of levels, until we have reached the following snippet:

  cmd_addr #(
    .IODELAY_GRP(IODELAY_GRP),
    .IOSTANDARD(IOSTANDARD_CMDA),
    .SLEW(SLEW_CMDA),
    .REFCLK_FREQUENCY(REFCLK_FREQUENCY),
    .HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE),
    .ADDRESS_NUMBER(ADDRESS_NUMBER)
  ) cmd_addr_i(
    .ddr3_a   (ddr3_a[ADDRESS_NUMBER-1:0]), // output address ports (14:0) for 4Gb device
    .ddr3_ba  (ddr3_ba[2:0]),             // output bank address ports
    .ddr3_we  (ddr3_we),                 // output WE port
    .ddr3_ras (ddr3_ras),                // output RAS port
    .ddr3_cas (ddr3_cas),                // output CAS port
    .ddr3_cke (ddr3_cke),                // output Clock Enable port
    .ddr3_odt (ddr3_odt),                // output ODT port,
    .cmd_slot (cmd_slot),
    .clk      (clk),                     // free-running system clock, same frequency as iclk (shared for R/W)
    .clk_div  (clk_div),                 // free-running half clk frequency, front aligned to clk (shared for R/W)
    .rst      (rst),                     // reset delays/serdes
    .in_a     (in_a[2*ADDRESS_NUMBER-1:0]), // input address, 2 bits per signal (first, second) (29:0) for 4Gb device
    .in_ba    (in_ba[5:0]),              // input bank address, 2 bits per signal (first, second)
    .in_we    (in_we[1:0]),              // input WE, 2 bits (first, second)
    .in_ras   (in_ras[1:0]),             // input RAS, 2 bits (first, second)
    .in_cas   (in_cas[1:0]),             // input CAS, 2 bits (first, second)
    .in_cke   (in_cke[1:0]),             // input CKE, 2 bits (first, second)
    .in_odt   (in_odt[1:0]),             // input ODT, 2 bits (first, second)
//    .in_tri   (in_tri[1:0]),             // tristate command/address outputs - same timing, but no odelay
    .in_tri   (in_tri),             // tristate command/address outputs - same timing, but no odelay
    .dly_data (dly_data[7:0]),           // delay value (3 LSB - fine delay)
    .dly_addr (dly_addr[4:0]),           // select which delay to program
    .ld_delay (ld_cmda),               // load delay data to selected iodelayl (clk_div synchronous)
    .set      (set)                      // clk_div synchronous set all delays from previously loaded values
);
At this point we have already stripped of all the necessary bits from the command, as indicated in bold.

You might also pick up that we are doubling up on the bits, like we are multiplying ADDRESS_NUMBER by 2, with the bank we are passing 6 bits instead of the required 3 and so on. So, in effect for most part of the system we already catering for two commands per 4 time slots. It is just that right at the top we are passing down a single command.

Now, within cmd_addr module, we need to make a couple of changes to handle two commands per 4 time slots. First let us look at the module for outputting the address to DDR3 memory:

// All addresses
generate
    genvar i;
    for (i=0; i<ADDRESS_NUMBER; i=i+1) begin: addr_block
//       assign decode_addr[i]=(ld_dly_addr[4:0] == i)?1'b1:1'b0;
    cmda_single #(
         .IODELAY_GRP(IODELAY_GRP),
         .IOSTANDARD(IOSTANDARD),
         .SLEW(SLEW),
         .REFCLK_FREQUENCY(REFCLK_FREQUENCY),
         .HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE)
    ) cmda_addr_i (
    .dq(ddr3_a[i]),               // I/O pad (appears on the output 1/2 clk_div earlier, than DDR data)
    .clk(clk),          // free-running system clock, same frequency as iclk (shared for R/W)
    .clk_div(clk_div),      // free-running half clk frequency, front aligned to clk (shared for R/W)
    .rst(rst),
    .dly_data(dly_data_r[7:0]),     // delay value (3 LSB - fine delay)
    .din({{2{in_a_r[ADDRESS_NUMBER+i]}},{2{in_a_r[i]}}}),      // parallel data to be sent out
//    .tin(in_tri_r[1:0]),          // tristate for data out (sent out earlier than data!) 
    .tin(in_tri_r),          // tristate for data out (sent out earlier than data!) 
    .set_delay(set_r),             // clk_div synchronous load odelay value from dly_data
    .ld_delay(ld_dly_addr[i])      // clk_div synchronous set odealy value from loaded
);       
    end
endgenerate
Here cmda_single is applicable to a single address bit, so we need to replicate it for every bit of the address. We do that with a for-loop construct.

Now, with the din port we need to supply four bits of data for each applicable address bit, which is needed by an OSEDRDES serializer. For the first two timeslots we duplicate the first address twice, and for the last two timeslots we duplicate the last address twice.

We need to do a similar exercise for the bank address, so I am not going to show the code for that here.

At first side it may seem a bit puzzling that I duplicate the address bits and bank address bits, instead of pinning it to the correct slot. This is because in the other non-command slots the address is ignored, so we can actually save quite a bit on logic here, especially knowing that there is quite a number of address bits.

For the RAS/CAS/WE bits, we do something like the following:

// we
    cmda_single #(
         .IODELAY_GRP(IODELAY_GRP),
         .IOSTANDARD(IOSTANDARD),
         .SLEW(SLEW),
         .REFCLK_FREQUENCY(REFCLK_FREQUENCY),
         .HIGH_PERFORMANCE_MODE(HIGH_PERFORMANCE_MODE)
    ) cmda_we_i (
    .dq(ddr3_we),
    .clk(clk),
    .clk_div(clk_div),
    .rst(rst),
    .dly_data(dly_data_r[7:0]),
    .din({cmd_slot[1] ? {in_we_r[0], 1'b1}  : {1'b1 , in_we_r[0]},
          cmd_slot[0] ? {in_we_r[1], 1'b1}  : {1'b1 , in_we_r[1]}}),
    .tin(in_tri_r), 
    .set_delay(set_r),
    .ld_delay(ld_dly_cmd[3]));
Note as before, our command slot is still 2 bits, but the meaning has a changed a bit. Previously cmd_slot was to be interpreted as a number between 0 and 3, but now each memory channel has its own bit, and have each access to only to two slots.

With all these alterations done to deal with a dual channel memory controller, let us see how our state machine will deal with dual channel memory requests:

              ROW_CMD: begin
                  if (edge_count == 9)
                  begin
                      test_cmd <= 32'h000001ff;
                      phy_rcw_pos_2 <= 2;
                  end else
                  begin
                      test_cmd <= 32'h000005ff;
                      phy_rcw_pos_2 <= 7;
                  end
                  
                  cmd_slot <= 0;
                  if (edge_count == 8)
                  begin
                      state <= COL_CMD;
                      test_cmd <= {1'b0, 8'b0, cmd_address[15:10], 1'b0, 16'h21fd};
                  end
                  
              end

              COL_CMD: begin
                          state <= WAIT_READ_WRITE_0;
                          test_cmd <= {1'b0, 4'b0, {cmd_address[9:3], map_address[2:0]}, 1'b0, 4'h1, 
                      (write_out ? 2'b11 : 2'b00), 10'h1fd};
                          cmd_slot <= 1;
                          mem_channel <= 0;  
                          data_in <= {8{cmd_data_out}};
                          do_write <= write_out;
              end
              WAIT_READ_WRITE_0: begin
                  state <= WAIT_READ_WRITE_1;
                  dq_tri <= do_write ? 0 : 15;
                  cmd_slot <= 0;
                  test_cmd <= do_write ? 32'h000005ff : 32'h000001ff;                  
              end


              WAIT_READ_WRITE_1: begin
                  state <= WAIT_READ_WRITE_2;
              end

              WAIT_READ_WRITE_2: begin
                  test_cmd <= 32'h000001ff;
                  phy_rcw_pos_2 <= 3;
                  state <= PRECHARGE_AFTER_WRITE;
              end

              PRECHARGE_AFTER_WRITE: begin
                  
                  data_in <= {8{16'h8888}};
                  dq_tri <= 0;
                  phy_rcw_pos_2 <= 4;
                  mem_channel <= 1;
                  state <= POST_PRECHARGE;
                  cmd_slot <= 3;
                  test_cmd <= 32'h000029fd;
              end

              POST_PRECHARGE: begin
                  cap_value <= data_out;
                  state <= ROW_CMD;
                  phy_rcw_pos_2 <= 7;
                  test_cmd <= 32'h000005ff;
              end

I have bolded the parts that is required to perform memory operations for the second channel. For now I have hardcoded a write operation for the second channel, writing the hex value 8888 to a particular memory location in bank 1, every time it is the turn of the second memory controller.

I will give more meaningful stuff for the second memory channel to do in coming posts. For now it is just important to see that these two memory channels can co-exist without any issues.

An important part of second memory channel operations is the register phy_rcw_pos_2. This indicated which bits RAS/CAS/WE should be asserted for the applicable timeslot for the second memory controller. The bits are as follows:

  • Bit 0: Write Enable
  • Bit 1: CAS
  • Bit 2: RAS
It is important to note that these bits are active when low.

Viewing dual channel in action

Let us have a look at our dual channel setup in a simulation waveform:


I have marked with red C's when our 6502 core clocks.

I have marked with lime coloured arrows where operations of our first memory channel happens, which is also the memory channel that our 6502 core uses.

Likewise, I have indicated with blue arrows, where operations happens for our second memory channel. As mentioned previously, we only do a write operation currently for our second channel, which operates on bank 1.

You might find it a bit strange our first blue arrow is a pre-charge command (e.g. WE and RAS asserted) and not an activate command (e.g. RAS asserted only). This is because this command forms the last command in a series that started in the previous clock cycle. 

During the simulation everything worked fine and I didn't got any DDR timing violation errors. I also ran on the physical FPGA and all reads/writes of the first memory channel works 100%

In Summary

In this post we started to implement a dual channel memory controller. Up to this point we got memory operations for the two channels to live together. The second memory channel, however, is only performing writes at the moment.

In the next post we will do some more work on our second channel of our memory controller, so that it can do some more useful work.

Until next time!