Foreword

In the previous we implemented the compare and branch instructions for our Flutter C64 Emulator.

In this post we will implement the 6502 stack and related operations like pushing/ popping, Jump to subroutine and Return from subroutine.

Enjoy!

The stack concept

The stack is a Last in First out (LIFO) data structure. To visualise a stack in real life, one can look at a receipt stack:

Clearly, one can see that the receipt that is most accessible is the last receipt you have placed on top of the pile.

In a CPU the stack has many uses, like if you were calling subroutines in a nested way, and you want to return to the caller of a subroutine. A stack is perfect for this, because you want access to the last return address.

On the 6502, the stack is 256 bytes in size and lives in page 1 of the memory space. That is the address range $100 - $1ff. On the 6502 the stack grows downwards starting at $1ff, growing down towards $100. Obvious as you pop stuff off the stack it goes back towards $1FF.

On the 6502 the stack has many uses, of which we already mentioned jumping and returning from sub routines. You can also push and pop registers. The 6502 also uses the stack when serving interrupts. Before an interrupt routine is called it stores the state of the CPU on the stack, so if the service routine is finished, it restores the CPU to the state before the CPU was interrupted, and the program continues as if nothing has happened.

Creating the stack mechanism

Let us start by writing some code for implementing the stack mechanism. We start by defining a stack pointer:

int _sp = 0xff;

We start with the initial value of 0x1ff, which is the starting poisition of the stack. We omit the high byte value of 1, and will just prepend it if we need to do any lookups in memory.

Now let us create some push and pop instructions.

  push(int value) {
    memory.setMem(value, _sp | 0x100);
    _sp--;
    _sp = _sp & 0xff;
  }

  int pull() {
    _sp++;
    _sp = _sp & 0xff;
    return memory.getMem(_sp | 0x100);
  }

With the understanding that the stackpointer points to the location where the push will happen, we can use the stackpointer address as is when storing the value of the push and then decrement the pointer thereafter.

However, since the pointer points to the next push location, you cannot use the location as is when doing a pull. You first need to increment the pointer and use that value for the read address.

Before ending this section, let us see if we can implement the basic stack instructions Push accumulator(PHA) and Pull Accumulator(PLA) to see if our stack implementation behaves as expected.

    /*
    PHA (PusH Accumulator)          $48  3
    */
      case 0x48:
        push(_a);
    /*
    PLA (PuLl Accumulator)          $68  4
    */
      case 0x68:
        _a = pull();
        _n = ((_a & 0x80) != 0) ? 1 : 0;
        _z = (_a == 0) ? 1 : 0;

Implementing JSR and RTS

Let us now implement the JSR (Jump to Subroutine) and RTS (Return from Subroutine) instructions.

So, in principle when the JSR executes, it pushes the address of the next instruction on the stack as the return address before jumping to the subroutine. When the subroutine finishes executing and invoke RTS, it pulls this address again of the stack and jump to it.

However, there is a small caveat with this sequence of events. The return address pushed onto the stack is not exactly the return address of the next instruction, but the address of the next instruction -1.

This way of operation of the JSR, the designers of 6502 implemented as a kind of an optimisation. When reading instructions from memory the program counter is incremented by 1 each time, and by the time it needs to push the return address the PC is still pointing to the last byte of the JSR instruction.

Now, if were to implement the JSR/RTS in your emulator with the assumption that the value pushed on the stack is purely the address of the next instruction, without worrying about the -1 stuff, you emulator would probably work fine 99% of the time. That been said, however, I did encounter some magic 6502 code in the past that interrogate the contents of the stack for implementing stuff like copy protection or auto-starting code. In such cases, your emulator might not work correctly with such code if your emulate the JSR instruction doesn't push adresses on the stack following the -1 convention.

So, it is important to adhere to this convention when implementing the JSR/RTS instructions.

Here is the implementation of these two instructions:

/*
MODE           SYNTAX       HEX LEN TIM
Absolute      JSR $5597     $20  3   6
 */
      case 0x20:
        int temp = (pc - 1) & 0xffff;
        push(temp >> 8);
        push(temp & 0xff);
        pc = resolvedAddress;
/*
MODE           SYNTAX       HEX LEN TIM
Implied       RTS           $60  1   6
 */
      case 0x60:
        pc = pull();
        pc = pc | (pull() << 8);
        pc++;
        pc = pc & 0xffff;

Implementing the other stack operations

Let us now implement the rest of the stack operations.

The simplest of these operations are the transfer between the Stack Pointer register and the X register, which is TSX and TXS. So let us quickly implement them:

/*
        TXS (Transfer X to Stack ptr)   $9A  2
 */
      case 0x9a:
        _sp = _x;
/*
        TSX (Transfer Stack ptr to X)   $BA  2
 */
      case 0xba:
        _x = _sp;

What remains to be implemented is pushing and pulling the status register, that is the register that contains all the flags, like the Zero Flag, Negative Flag, overflow flag and do so on.

At this point the question arises in which order the flags are stored in the status byte that gets pushed onto the stack. One possibility is deciding on the order of the flags yourself and emulation will probbaly work correctly 99% of the time.

However, as I mentioned in the previous section where we implemented the JSR/RTS instructions, you often 6502 machine language programs that inspect the contents of the stack, so if you decide the order of the flags in the status byte yourself, this code might not work correctly.

The question is: How do we find the correct order of the flags in the status register? In the general the web sites that gives you info on the 6502 instructions, don't provide you with this info on the status register.

After digging a bit on the internet, I found the information via the following link:

https://www.princeton.edu/~mae412/HANDOUTS/Datasheets/6502.pdf

They provide a nice diagram for the status register:

Some extra information about the status register is that bit 4 and 5 should be one when pushed on the stack. Similarly, when popping this value back to the status register, we ignore bits 4 and 5. With all this said, let us implement the PHP and PLP instructions:

/*
        PHP (PusH Processor status)     $08  3
 */
      case 0x08:
        push((_n << 7) | (_v << 6) | (3 << 4) | (_d << 3) | (_i << 2) | (_z << 1) | _c);
/*
        PLP (PuLl Processor status)     $28  4
 */
      case 0x28:
        int temp = pull();
        _c = temp & 1;
        _z = (temp >> 1) & 1;
        _i = (temp >> 2) & 1;
        _d = (temp >> 3) & 1;
        _v = (temp >> 6) & 1;
        _n = (temp >> 7) & 1;

We have implemented all instructions for this post. In the next section we will write a test program for all the instructions we have added.

The Test Program

We will use the following for our test program:

0000 A9 0A LDA #$0a
0002 48    PHA
0003 48    PHA
0004 48    PHA
0005 48    PHA
0006 a2 50 LDX #$50
0008 9a    TXS
0009 48    PHA
000a 48    PHA
000b 48    PHA
000c 48    PHA
000d A9 7F LDA #$7f
000f 69 01 ADC #$01
0011 20 19 00 JSR TEST
0014 68    PLA
0015 68    PLA
0016 68    PLA
0017 68    PLA
0018 68    PLA
0019 08 TEST PHP
001a B8    CLV
001b A9 00  LDA #$00
001d 28     PLP
001e 60     RTS

Here we test a couple of operations of the stack. Pushing and pulling elements from the stack, changing the stack pointer, doing a JSR/RTS and pushing and pulling the Status register.

Currently within our emulator, we only have a view of the first page of memory (e.g. bytes 0 to 255). However, when executing the above program it would be nice to extend the view so we can see what is happening on the stack as well. I have made the change and it look like this:

I am not going to cover the changes required to adjust the view like this, but it is available in a git tag I have created here. This tag also contains the test program for this post as binary which will execute as you click the step button.

Lets see how the stack changes as we execute the program. We start by pushing the Accumulator a number of times to the stack. We can see our values towards the end of page 1:

We then change the stack pointer to 0x50 and do a couple of pushes again of the Accumulator. We can now see the contents pushed is now in a different aread in memory:

Next, we force the Overflag flag to be set by doing an addition that causes an overflow after which we push the status register. With the overflow operation we just mange to set as much flags as possible. We then jump to a sub routine which pushes some stuff on the stack.

At this point, our memory dump will look like this:

The return address pushed is 0013. As mentioned in a previous section the return address pushed is always one less than the actual address, because of the design of the 6502.

The value pushed for the Status Register is F0 (e.g. the upper 4 bits set). As mentioned previously, bits 4 and 5 are always set, and because of the operations we did, the overflow flag is set as well as the negative flag.

We then clear the negative and overflow flag on purpose to see if the PLP instruction at the end of the subroutine restore them for us.

We then correctly return from the subroutine continuing execution at address 0014. We then do a number of pulls to our accumulator to see if we get back the same values that originally pushed. By purpose I have added an extra PLA afterwards to see what it does. And as expected, we get a 00 because that it after the last value.

This concludes what we want o achieve in this post

In Summary

In this post we implemented all stack operations, including push and pull the accumulator and the Status register. We also implemented the JSR/RTS instructions, which also relies on the stack.

We are just about finished with implementing all instructions for the 6502. What remains are the following:

BIT
JMP (Jump)
NOP
Implied register operations

So, in the next post I will be implementing these.

With the above implemented, we can move onto more interesting things, like running the Klaus Dormann Test Suite on our emulator to see if it behaves like a real 6502. This is very important, because it will help us to emulate a game as accurately as possible.

Until next time!

C64 on an FPGA

Thursday, 20 February 2025

A Commodore 64 Emulator in Flutter: Part 8