Every semester, there's a different instruction set for CPE480. It's actually quite difficult to keep coming up with something different, interesting, and vaguely reasonable. This semester, it's something really tacky... yes, TACKY: the twin accumulator computer from Kentucky.
Of the many computers I've designed, this is one of the most unusual... but that doesn't mean it's bad. In fact, I think it's quite clever. It's a 16-bit machine, with 16-bit instructions, but it's actually a VLIW (Very Long Instruction Word) machine. Ok, 16 bits isn't really "very long" -- but it typically packs two instructions into each instruction word, so it's not hard to get two instructions executing per clock in compute-heavy code.
The TACKY instruction set is very simple, but the "twin" aspect of it is a little strange. So, let's ignore that for now.
Ok, not that we've gone over the other aspects, how does that "twin" stuff work? Well, there are two accumulators. As usual, you don't specify the accumulator explicitly; it is always implied as the first operand and destination. But how does that work with two accumulators? The answer is that the accumulator to be used is specified by the slot the instruction occupies within the instruction word. Let's take a simple example:
add $5, mul $6
Because the add $5 is in "slot 0" of the instruction word, this instruction is really doing $0 = $0 + $5. Similarly, mul $6 is really doing $1 = $1 + $6. These paired instructions would be expected to execute simultaneously... which does suggest a potential problem. Suppose the pair is:
add $5, a2r $0
Both of those instructions want to write into the same register, $0. Because the operations are supposed to happen simultaneously, the result would be undefined... which is a polite way of saying that both instructions within an instruction word should never write into the same register. You don't need to have your assembler detect this and flag an error, but be careful you don't accidentally do this when testing one of your processor designs later in this course.
Ok, that all seems simple enough. However, what happens if you don't have two instructions to pack into one instruction word? Well, that's easy enough: here's a funny-looking pair of null operations:
r2a $0, r2a $1
Because TACKY is actually a two-wide VLIW architecture, the instruction encoding is a bit strange. Each operation is nominally 8 bits, but an instruction word is 16 bits. Some single instructions take an entire 16-bit word by themself. In other cases, two instructions can be packed side-by-side within an instruction.
Instruction | Description | Functionality | Result Type | Pack |
---|---|---|---|---|
a2r $r | Copy acc to register, copy type | $r = $acc | typeof(acc) | Field acc |
add $r | Typeof(acc) add register to acc | $acc += $r | typeof(acc) | Field acc |
and $r | Bitwise AND register to acc | $acc = ($acc & $r) | typeof(acc) | Field acc |
cf8 $r,imm8 | Load {pre, imm8} into reg | $r = {pre, imm8} | float | Span 0,1 |
ci8 $r,imm8 | Load {pre, imm8} into reg | $r = {pre, imm8} | int | Span 0,1 |
cvt $r | Convert int to float or float to int | $acc = ((oppositetypeof($r)) $r) | oppositetypeof(r) | Field acc |
div $r | Typeof(acc) divide acc by register | $acc /= $r | typeof(acc) | Field acc |
jnz8 $r,imm8 | Jump to {pre, imm8} if r is not 0 | if ($r!=0) pc = {pre, imm8} | Span 0,1 | |
jp8 imm8 | Jump to {pre, imm8} | pc = {pre, imm8} | Span 0,1 | |
jr $r | Jump to register (int) | pc = $r | Either 0,1 | |
jz8 $r,imm8 | Jump to {pre, imm8} if r is 0 | if ($r==0) pc = {pre, imm8} | Span 0,1 | |
lf $r | Load float from memory into reg | $r = memory[$acc] | float | Field acc |
li $r | Load int from memory into reg | $r = memory[$acc] | int | Field acc |
mul $r | Typeof(acc) multiply acc by register | $acc *= $r | typeof(acc) | Field acc |
not $r | Bitwise NOT register to acc | $acc = (~$r) | typeof(acc) | Field acc |
or $r | Bitwise OR register to acc | $acc = ($acc | $r) | typeof(acc) | Field acc |
pre imm8 | Load 8-bit prefix register | pre = imm8 | Span 0,1 | |
r2a $r | Copy register into acc, copy type | $acc = $r | typeof(r) | Field acc |
sh $r | Typeof(acc) shift left/right by register | $acc = shift($acc,$r) where $r holds an int | typeof(acc) | Field acc |
slt $r | Typeof(acc) set acc less than register | $acc = ($acc<$r) | int | Field acc |
st $r | Store acc into memory[register] | memory[$r] = $acc | Field acc | |
sub $r | Typeof(acc) subtract register from acc | $acc -= $r | typeof(acc) | Field acc |
sys imm8 | System call | system(imm8) | Span 0,1 | |
xor $r | Bitwise XOR register to acc | $acc = ($acc ^ $r) | typeof(acc) | Field acc |
Macro | Description | Functionality | Result Type | Pack |
---|---|---|---|---|
cf $r,imm16 | Constant float | Sequence of pre, cf8 | float | Span 0,1 |
ci $r,imm16 | Constant int | Sequence of pre, ci8 | int | Span 0,1 |
jnz $r,addr | Jump to addr if r is not 0 | Sequence of pre, jnz8 | Span 0,1 | |
jp addr | Jump to addr | Sequence of pre, jp8 | Span 0,1 | |
jz $r,addr | Jump to addr if r is 0 | Sequence of pre, jz8 | Span 0,1 |
There are just 8 registers... which isn't a lot, so we'll try not to waste them. They all have names as well as numbers, and either can be used interchangeably; $r3 and $(4-1) would be treated identically. Perhaps the best way to give both is the following specification (formatted as an AIK specification):
.const {r0 r1 r2 r3 r4 ra rv sp}
The registers that have special meanings are:
Register Number | Register Name | Use |
---|---|---|
$0 | $r0 | accumulator for slot 0 instructions |
$1 | $r1 | accumulator for slot 1 instructions |
$5 | $ra | return address for simple functions |
$6 | $rv | return value |
$7 | $sp | stack pointer (there is no frame pointer) |
You might be surprised, or perhaps a bit scared, to learn that TACKY supports float arithmetic. Yes, there is floating-point hardware. IEEE 754-2008 floating point typically uses at least 32 bits to represent a value, whereas here we get 16 bits. Actually, 16-bit floats are not a new invention; they are sometimes called half precision. An IEEE single float normally has a sign bit, an 8-bit exponent, and a 24-bit mantissa magnitude stored in just 23 bits. So, how many bits is each of those things in a 16-bit float? Well, IEEE suggests 1+5+11 bits. However, we sacrifice IEEE compliance to get a more useful dynamic range... we were always going to ignore denorms, infinities, NaNs, and rounding modes anyway. ;-)
The 16-bit float format used in TACKY basically looks like the first 16 bits of an IEEE 32-bit float. That means 1 sign bit, 8 exponent bits, and 8 mantissa magnitude bits. It's not a huge change, but sacrificing some precision buys us a much larger dynamic range and means, for example, that mul only needs to do an 8x8 bit multiply -- which credibly can be implemented within a single clock cycle without a rediculous amount of circuitry.