We are still actively working on the spam issue.
/aig/ Alternative ISA General
Alternative ISA General is a discussion thread about non x86 hardware. "Alternative" doesn't mean "unpopular" it means "alternative to x86". While there have been such threads in the past, they were usually sporadic and not very well connected with one another, which meant that whatever transpired in one thread wasn't carried over to the next one.
Due to the rise of desktop-class ARM chips, interest in alternative hardware has risen, with many Anons even coming up with projects of their own. Therefore, a centralised place was needed, where we could keep track of the development and goals of the community.
While discussion of Intel or AMD hardware is not absolutely prohibited (and even if it were, who is gonna enforce this? LOL), due to the ubiquity of x86 hardware, it is assumed that whatever concerns such architecture can be discussed in any of the other gorillion threads on the board.
Old threads are available on Desuarchive.
- 1 Ongoing projects
- 2 Resources
- 3 ISA Overview
- 4 ISA Features
- 5 ISA Implementations
- 6 Making your own ISA
- 7 Alternative OS for Alternative ISA
- 8 Simulators
- 9 Links
Anons are currently interested in porting several open source projects to the PowerPC architecture. Currently the following proposals have been made:
Grand Theft Auto III
Re3 is a homebrew engine intended to replace proprietay RenderWare with an open source implementation. Anons have been discussing making a port for the 32-bit PowerPC version of Mac OS X.
The Elder Scrolls III: Morrowind
OpenMW is a free and open source modern re-implementation of the Gamebryo engine.
OpenLara is a Classic Tomb Raider open-source engine.
Anon has been kind enough to put together a small reference library.
A collection of brief infocards for many different processors is available on TextFiles which also holds huge archives relating to programming and microcomputers, especially 8-bit processors. Also WikiChip and CPU Shack have a lot of information on alternative ISAs, even though the front page is dominated by mainstream processors.
Wikipedia has a Comparison of instruction set architectures. There was also a List of instruction sets which was decided to merge with the Comparison article, but in typical Wikipedia fashion a redirect was made without anyone doing the real work of merging the articles. Thus the archived List of instruction sets is still worth a review.
ISA simply means Instruction Set Architecture. This is what the programmer sees from the outside, which these days is very different from the microcode and state machines operating inside the processors, normally inaccessible for normal programmers. The mid 1970's saw a Cambrian explosion of architectures that later fossilised into what we see today. Any assembly programmer and academics such as Hennessy and Patterson agree that x86 stinks, but as usual inertia and money trumps speed, efficiency and elegance. Those three qualities are what we instead celebrate in this general.
The same ISA can be implemented by many different internal architectures and microcode. The ISA is the main topic but since many architectures are DIY we discuss both. The best way is to illustrate by examples of processors. An ISA is defined by a fairly large set of parameters.
Much can be summarised as CISC that is complex, or RISC which is simple. The RISC definition has drifted over the decades, and changed from "simple" to register-to-register based operations with load-store memory handling with greatly reduced and simplified addressing modes.
The question is simply few or many. Few is good for low latency, many registers are good for lazy programmers and poor compilers. 6502 managed with just one accumulator plus a handful other registers. Compiler writers prefer at least 16 registers. One cannot avoid noting modern processors have tons of registers but still seem sluggish.
Operations are performed on registers of some form.
Accumulator based where all processing is via one (or 2, rarely more) main register, examples are most early 8-bit processors;
Stack based where everything is performed on a stack, though TOP (top of stack) is in a register for fast access and operations, possibly also next on stack, examples are Novix NC4000 and many virtual processors; or
Register file where many registers can be used in similar ways, examples are 68K and many modern processors.
Accumulator that is the default destination(s) of operations, usually tightly coupled to the ALU (Arithmetic Logic Unit)
Data Register similar to accumulator, but on processors with register files such as 68K that had 8 data registers
Address Registers used for addressing into memory, usually tightly coupled to the data address generators. Stack pointers can be a form of an address register
Index Registers used for indexing into memory from an offset that may come from an address register
2-op instructions of the form A += X; or
3-op instructions of the form A = B + C.
The former requires a little extra thought but 3-op is simple for lazy programmers and poor compilers. One cannot avoid noting modern processors drift towards 3-op instructions
|Accumulator||The instruction operates on the accumulator (and not, say, memory)||6502: ROL A|
|Absolute||The instruction operates on memory defined by a full width address||6502, 16-bit address: LDA $FF00|
|Absolute, X||The instruction operates on memory defined by a full width address, with offset defined by index register X||6502, 16-bit address: LDA $FF00, X|
|Absolute, Y||The instruction operates on memory defined by a full width address, with offset defined by index register Y||6502, 16-bit address: STA $FF00, Y|
|Immediate||The instruction uses data in program memory subsequent to the instruction||6502: LDX #$FD|
|Implied||Data is implied the instruction||6502: SEI|
|Indirect||Data is accessed indirectly via a pointer||6502: JMP ($F000)|
|Indexed Indirect||Data is accessed indirectly via a pointer where the pointer is offset by the X register||6502: LDA ($C0, X)|
|Indirect Indexed||Data is accessed indirectly via a pointer where the target of the pointer is offset by the X register||6502: LDA ($D0), Y|
|Relative||Data is accessed or program counter is accessed by an offset from present position||6502: BNE $F300|
|Zero Page||Data is accessed from the zero page, addressed by a single byte||6502: LDA $A0|
|Zero Page, X||Data is accessed from the zero page, addressed by a single byte, with offset defined by index register X||6502: LDA $A2, X|
|Zero Page, Y||Data is accessed from the zero page, addressed by a single byte, with offset defined by index register Y||6502: LDA $A4, Y|
|Post Increment||Data is accessed via an address register that is incremented after access||68K: MOVE.L (A0)+,D3|
|Pre Decrement||Data is accessed via an address register that is decremented before access||68K: MOVE.W -(A7),D4|
Implied, immediate, absolute, absolute indexed, zeropage, zeropage indexed, stack relative, index indirect, indirect indexed, all with or without pre/post increment/decrement. Much can be combined. Stack relative addressing is very helpful for C programming, where variables are often transferred to the stack when calling a function. Relative addressing makes it possible to make relocatable code, which is useful when running several programs without MMU.
Zero page was an addressing mode where the address was a single byte and it was implied this related to addresses 0x00 - 0xff where the high byte of the address (the "page") was set to zero. This saved space and time, typically 30 percent. This was later improved by Direct page wherein the page was set by a separate 8-bit register, such as on the 6809. This allowed moving the active page around and reduced the zero page pressure enormously. Lately zero page and direct page have fallen out of fashion.
Von Neuman which is an unified memory architecture for program and data.
Harvard Architecture where program and data are located in different memory spaces.
Recent designs are often hybrids, in that ISA seen by programmers is von Neuman, while at cache 1 level there is a Harvard architecture. This means self modifying code will fail to work.
CPU that we are all familiar with
DSP are Digital Signal Processors that tend to use Harvard or super Harvard architecture, with separate memory buses for X-, Y-, and program memory. These are optimised for processing long series of numbers, typically sampled signals from an ADC, low power consumption and hard real-time requirements. Typically a DSP is an accumulator based design tightly coupled to the MAC (Multiply and Accumulate) unit, which is the heart of the DSP. In a typical clock tick, the DSP loads a parameter from X-memory and a parameter from the Y-memory, multiplies these and sums this with the accumulator, while incrementing the pointers into X- and Y-memory. Optionally there is also a shift and rounding in every tick. The pointer incrementing typically takes place in the data address generators that serve to feed the MAC at maximum speed. A well known C equivalent is
- A += *x++ * *y++
GPU are more recent design where competition drives towards an all out performance design no matter the thermal issues.
8 Bit Processors
These usually have 8 bit registers and a 16 bit program counter or instruction pointer (terminologies vary) and could access 64 KB memory. Most were accumulator based which worked well since in this era memory and CPUs were equally slow.
This is a weird and wonderful processor implemented as bit serial architecture, which made it slow. It was popular for machines such as Cosmac ELF. The fabrication made it radiation resistant and it was also popular for satellites, and is still in production. A modern and compact machine is the 1802 Membership Card in the credit card form factor. There is an RCA 1805 that added a prefix code and several more new instructions.
CHIP-8 was a popular virtual processor or very low level language, popular on the 1802 platform, and fast enough for making games. It has been extended and ported to many platforms.
The 6502 had one accumulator (A), two index registers (X and Y), a stack pointer (S) and a processor status (P), all 8 bit wide; plus a 16 bit program counter (PC). It also had a zero page that could be used as address registers. It entered the market at a much lower price than 6800 and quickly won a following. It was used in many popular computers of that era including Apple 2, BBC and Commodore 64. For all the limitations it was powerful enough in the hands of skilled programmers to power the first spreadsheet (VisiCalc) which was also the first killer application, as well as 3D space games with hidden line wireframe graphics such as Elite.
The 6502 still has many loyal fans, hugely active communities and dozens of implementations. Complete development platforms, simulators, debuggers, operating systems, libraries and more are available, most for free. It is still supported commercially by The Western Design Center, founded by the original designer. An estimated 200 million chips are made annually for an installed base estimated at 2 billion. Not bad for a nearly 50 year old design. This time span also means it is proven, and is therefore used in applications such as pacemakers, where lifetime guarantees take on an entirely new meaning. It is also seen in robots and the occasional terminator. The 6502 has an extremely low transistor count which makes it interesting for new opportunities such as a flexible version.
The 6502 has two weaknesses. First of all it is awkward for 16 bit pointer handling, which is why The Woz overcame this by making SWEET-16 virtual processor. The second is that the 6502 is not suited for stack intensive languages such as C. This has been overcome by other virtual processors such as the p-code for the UCSD p-System and VTL-2 (source), both of which exist for several ISA. A more recent virtual CPU for the 6502 is AcheronVM, self described as the successor to SWEET-16.
Several OS have been made for 6502, including LUnix (Little Unix), Minikernel, GeckOS/A65 and many more. GEOS was an add-on OS for C64 that provided windowing system plus many applications such as text processing, spreadsheets and more - all of this complex system fitting in 64 KB RAM. GEOS was not multi tasking, that extension came with Wheels, which also had a web browser, but increasing RAM requirement to a whopping 128 KB.
This was an early design with dual accumulators and one 16-bit index register.
This was the peak of 8-bit architectures, and even featured an opcode for multiplication. Hitachi got a license and made the 6309 variant that includes more registers
This was an offshoot of 8080 by Zilog and hugely popular in business applications thanks to CP/M. Zilog played evil games and won evil prizes.
While the chip may be old, people are still making new multi tasking windowing operating systems for it.
32 Bit Processors
The 16-bit generation had a short reign before being overtaken by 32-bit processors.
This ISA brought high performance with many registers (8 data registers and 8 address registers) and numerous addressing modes. The complexity might at first glance seem overwhelming, nevertheless it was very popular and performant with assembly programmer. Machines such as Amiga and Atari ST brought these into the hands of many hobbyists.
This is probably the peak of CISC and powered VAX computers, typically running the VMS operating system with a reliability where uptimes was measured in 10+ years. This can be simulated by SimH, see below.
Home Made Processors
Making a CPU chip requires a lot of work and infrastructure. Thankfully there are alternatives. The first is to use several chips, and TTL (Transistor-Transistor-Logic) chips were popular, and also used to prototype processors. Later FPGA (Field Programmable Gate Arrays) made things even simpler and faster.
These can be wire wrap monsters but work surprisingly well. A well known example is the Home Brew CPU complete with an adapted C-compiler and a port of Minix. It is accessible from the net. Other home built processors can be found at the Homebuilt CPUs WebRing.
A very recent and interesting case is the Gigatron TTL Computer that has a micro code system that can emulate a 6502 processor and a 16-bit processor, at a speed sufficient for simple games.
Not to be confused with pr0n, a softcore is a description (typically in languages such as VHDL or Verilog) that is compiled and then downloaded into a FPGA. In the raw state an FPGA is a large collection of primitive components such as adders, MUX etc. that are connected together by the bitstream from the compiler, and then turns into nearly any kind of digital devices such as a CPU, DSP, GPU, state machine or similar. A large collection of open source designs can be found on Github and OpenCores. These tend to be a lot faster than TTL processors, both in building/programming and in operations.
Making your own ISA
This is where things get exciting!
Start simple. Tempting as it may be to make the definitive ISA that once and for all will kick Intel off the market is not a good first project. And face it, if you are here reading this you are fairly new to ISA design. Start simple and get a feel for how it works. Like C or assembly programming, also this is about skill, experience and elegance that only comes from experience. And if you don't want to make it elegant, well, Intel has shown even that can have utility. So start simple, perhaps 8-bit or even 4-bit. Reimplement an existing ISA, the 6502 is very popular in this respect.
Going for a TTL design on breadboard or wirewrap, is an exercise in patience. FPGA might be simpler and avoids short circuiting pins, especially if you use development boards with FPGA and some auxiliary parts such as display, switches and LEDs. You may have done software debugging using printf, now you might have to do debugging using a LED...
Alternative OS for Alternative ISA
Many cross platform operating systems are available. Contiki OS is available for 6502, AVR and more. Microware OS-9 is available for 6809, 68K and more. FUZIX is a UNIX like OS available for many 8-bit processors and 68K
Often it can be impractical to run the actual hardware in order to test old software, such as ordering a large VAX to test VMS. The solution is a simulator, such as SimH, which is capable of simulating a large number of architectures.
The following is mostly a list of bookmarks.
Amiga (Motorola 68k)
Atari (Motorola 68k)
Other Motorola 68k Links
Symb-OS is a multitasking windowing OS for Z-80
Raspberry Pi (arm)
Apple Silicon (arm)
DCPU-16 was a virtual processor intended for the game 0x10c.
The p-code machine was a stack based virtual processor used by UCSD Pascal on Apple 2 and other machines.