Foreword
In the previous set of posts, I attempted to get an Amiga FPGA implementation to run on a Zybo board.
My attempts started off to look promising, getting an Motorola 68K to run on the FPGA.
However, in the last couple of months, I hit a brick wall: Memory latency. As you might know, the memory of an Amiga clocks at a speed of 7MHz. On the Zybo board, however, I was experiencing latency limiting my access to memory to 5MHz.
At this point, I just would like to point out that I don't have anything against the Zybo board. It is just that in order to properly utilise memory on the Zybo board, and any modern computer system, for that matter, you need to access memory in a pipelined matter, and use some caches.
Compared to modern day systems, the Amiga accesses memory in quite a random fashion, so caching would not really benefit the core of operation of an Amiga.
In this post I will unpack this limitation of memory latency a bit more, and I will give some pointers on which direction I am going to take to try and get around this issue.
More on the issue of latency
In my journey on creating an Amiga on the Zybo, I had been looking at the MiSTer project quite a bit:
https://github.com/MiSTer-devel/Main_MiSTer/wiki
There was one paragraph on the wiki page, which, if I have seen it originally, might have saved me some pain:
SDRAM board (recommended expansion) – This small board plugs into the GPIO0 connector of the DE10-nano board. Whilst the DE10-nano has fast DDR3 memory, it cannot be used to emulate a retro EDO DRAM due to a high latency and shared usage from the ARM side. This SDR SDRAM on a daughter board is required for most cores to emulate a retro memory module.
This actually also applies to the Zybo board, and since there is not really a way to add an SDRAM module to the Zybo board, the Zybo board is not really suited for emulating an Amiga within the FPGA.
Using an alternative board
With the Zybo board that is out of the question for what we want to do, I thought long and hard on what board I can use for this exercise.
What we are looking for, is a board that will provide as much direct access between the FPGA and RAM as possible. This will allow us to optimise and reduce latency for our kind of RAM access patterns.
From the MiSTer wiki, as I mentioned in the previous section, they indicated that the emulated cores uses SDRAM and that DDR3 RAM is not really suitable. What is not clear though, is if the majority is due to the DDR3 RAM, or what amount the shared ARM side contributes to the latency.
If it is true that DDR3 RAM just by itself gives unacceptable latency for an Amiga implementation, then this would immediately eliminate quite a number of FPGA's that could be used for this exercise.
It would be nice if we can get some ballpark figure for DDR3 latency, to see, at least in theory, if it would be possible to run an Amiga core with DDR3 memory.
One brand of memory where the DDR3 timings is readily available in their datasheets, is Micron. Firstly, reading to this link, a good estimate for latency based on DDR timings, would be the following formula:
TRP + TRCD + CL
Peaking at one of the datasheets of Micron, it seems that a typical value for adding these numbers together, is around 39ns. So, if, for each memory access there is a delay of 39ns, we are looking at a memory speed of about 25MHz.
So, it seems that we might be ok, at least in theory, when using DDR3 memory for emulating an Amiga core. That been said, this assumes we can control how the data is accessed from DDR3 memory.
I am going to take a gamble on this, and see how far I am going to get implementing an Amiga on another FPGA board that support direct access to DDR3 memory.
The question is: which board? The board I have in mind is the Arty-A7 development board, from Digilent:
Hi Johan, Nice to read on your progress in FPGA land ;)
ReplyDeleteSorry to hear the Zybo board didn't work out for you.
P.s. I saw another board, the Alchity CU / AU / AU+
Is this board possible to right board (read: solves your memory issues you experienced?)
Alchitry Cu FPGA : https://www.sparkfun.com/products/16526 50 USD
Alchitry Au FPGA : https://www.sparkfun.com/products/16527 100 USD
Alchitry Au+ FPGA : https://www.sparkfun.com/products/17514 300 USD
P.s. I guess if Amiga works, SNES, NES and C64 should be no problem ;)
Cheers and hope to speak soon (again) ;)
Jeroen Wolf, from the Netherlands (Europe)
Ps the Alchitry AU / AU+ boards also uses the Artix-7 FPGA ;)
ReplyDeleteAnother question concerning the MiSTer project. This board uses the Cyclone® IV EP4CE22F17C6N FPGA
ReplyDeletehttps://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=139&No=593&PartNo=2
Although a nice amount of LUTS it only features 594 Embedded memory (Kbits) = 74,25 KB BRAM.
I have 2 questions:
1.)
I know you can add an external memory module, but is this not real inefficiant compared to actual memory inside the FPGA / BRAM? Or can we look at this extra RAM module as an extended BRAM (direct access to the FPGA)?
2.)
In your Amiga travels ;) what would be your guess (sweet spot) for 'emulating' (sorry about this word ;)) the Motorolla 68000 CPU + video buffer & peripherals concerning the min. amount of LUTS?
cheers
Hi Jeroen
DeleteThanks for your comments. It is unfortunate, most FPGA's will not provide you with a lot of block ram. If you do find an FPGA with lots of block RAM, it comes with an unpleasant price tag.
The only way to get around this limited amount of Block RAM, is to utilise the RAM that comes with boards like the DE10-Nano, Alchitry Au and Arty-A7.
Of course, as you mentioned, you will not get the kind of performance with these RAM as you would get with blockRAM, but for emulating most retro systems, I think these latencies will be acceptable.
Having said that, the memory access on the DE-10 board just have a bit too much latency for emulating some retro systems.
To counter for this latency, the designers of the MiSTer system designed an SDRAM (synchronous DRAM) daughterboard, to which the FPGA have more direct access, cutting out all the latencies caused by pipelining.
This SDRAM is probably also not as snappy as block RAM, but the latencies will be acceptable for most retro systems provided in the MiSTer project.
To come back to the boards you have suggested. I think the copper board is probably a bit underpowered with regards to LE's. It will probably just have enough of those for the 68k processor.
My guess is that one would probably need around 15k-20k LE's for emulating the Amiga system.
I think the AU board would probably be suffice for the task, together with the DDR3 RAM it provides.
At this point, however, I cannot tell for sure whether direct access to the DDR3 RAM on the AU board or the ArtyA7 board, will give acceptable latency, or if one would need to venture for a board with SDRAM, like the MiSTer board.
Only time will tell with my endeavour of the ArtyA7 board.
Super Johan, thx for your swift reply ;)
ReplyDeletePs the Alchitry Au+ consists of an Xilinx Artix 7 XC7A100T FPGA > 135x 36Kb = 4860Kb BRAM = 607,5KB BRAM (wow)
That should be enough to emulate the Amiga 500 games? (512KB RAM)
Too bad about the price though... but it saves you time and you are not in need of an extra SDRAM expansion board?
P.s. Most (good) Amiga consists of multiple disks, in emulation land you can WHDLoad them. I can imagine putting these WHDLoad roms on a SD card and making use of the BRAM for running it inside the Amiga OS?
(read: a game consisting of multiple disks is still 512KB max RAM or does this multiple per number of disk??)
p.s gamenostalgia has them all ;)
For example "It came from the desert" / "Super Cars II" / "Elite" / "Syndicate" / "Benefactor" / "Superfrog":
https://gamesnostalgia.com/game/it-came-from-the-desert
https://gamesnostalgia.com/game/super-cars-ii
https://gamesnostalgia.com/game/elite
https://gamesnostalgia.com/game/syndicate
https://gamesnostalgia.com/game/benefactor
https://gamesnostalgia.com/game/superfrog
Cheers, Jeroen
The Au+ is indeed impressive, and would simplify your design considerably.
DeleteMy take on loading the disks would also be to store it on an SDCard. However, it is not necessary to load the whole disk image into BRAM all at once. You can have a very simple block, reading the Disk image a piece at a time from the SDCard, and then serialise each byte read, outputting a bit at a time, which you then feed to the floppy subsystem. (E.g. simulating a read head of a real floppy disk drive).
Thanks for the links to old classics :)
Wow a FPGA with enough Block RAM is just expensive, I did find an interesting board though, but it's on Ali (so you never know what you get ;))
Deletehttps://nl.aliexpress.com/item/4000170042795.html
It's <100 euro's
It has HDMI on the PCB, so that's nice :)
Indeed an impresive board for that price, but I agree with you, not sure if they had to sacrifice something to get to that price. 😀
DeleteJohan, I talked to Mark (in the Alchitry forum), he was very kind and helpfull.
DeleteIn his opinion it is perhaps feasable to also use the 'regular' Alchitry AU board (Xilinx Artix 7 XC7A35T FPGA) to 'emulate' to Amiga core (he also talked about DDR3 RAM speeds).
I also mentioned your blog in this forum :)
Is this board perhaps a solution for the troubles you experienced with the Zybo board?
Costs for the Aclhiry AU is just below 100 USD
https://www.sparkfun.com/products/16527
P.s. you can find the Alchiry forum at this location:
https://forum.alchitry.com/thread-369.html
Cheers, Jeroen
Thanks for the feedback.
DeleteI think the AU is also a very good candidate for what we want to achieve.
In the meantime, I have been playing with the Arty A7 board, which, as you know, has the same FPGA as the AU. The journey is a bit steeper than I have anticipated.
Let me give you a sneak preview of the difficulties, which I will cover in more detail in coming posts. In the memory interface tutorials of the Arty A7 (and I am sure the AU memory interface tutorials will do the same), you are introduced to the MIG (Memory Interface generator) tool in Vivado.
The MIG tool simplifies the task quite a lot for interfacing with memory from the FPGA, but I have found that the generated design contains an element or two that does introduce a fair bit of latency.
So, unfortunately I need to dive into more technical detail and write some of my own modules for the low level functionality.
hi johan, first best wishes for 2022.
ReplyDeleteI'm curious if your endevours on your Arty A7 board are getting along? Within perticular the memory issues (or solutions) regarding the use of DDR3 as fast RAM for the FPGA (with excepted latency). Ps I'm sure you already know this already, but have you heard of the Vampire V4+ Standalone (new Amiga computer)?
http://www.apollo-core.com/v4.html
The makers even simulated a custom made motolla CPU (68080).
Their Altera Cyclone V FPGA makes use of 12MB (from a 12MB DDR3 chip) as dedicated ram:
http://www.apollo-core.com/index_files/saga.jpg
Too bad they programmed it in VHDL :( ;)
Cheers, Jeroen
Hi Jeroen!
DeleteYou should have a fab 2022.
Thanks for the link. Very interesting. I couldn't find the VHDL code for this project. Perhaps they only provide the source code if you buy the board.
With my own Arty A7 project, I have been taking a couple of detours :) Things always seems OK when you test your design in simulations, but the real problems always start when trying to run the design on the real FPGA.
All in all, trying your own DDR design from the ground up is quite a steep learning curve :) I have found alternative solution along the way that seems less painful, but then you are faced with the latency problem.
In a way I can understand why the team of the Mister project decided to use a SDRAM daughter board instead of using DDR, because of the learning curve. With DDR you need to program a dozen of mode registers and do write- and read-levelling before you can start using the chip.
That been said of the complexity of DDR, it is quite nice to learn what is going on behind the scenes.
At this point in time, my main focus is on trying to get write-levelling on the Arty-A7. Once I have achieved this, I will do a blog post on this.