TACKY Reference Material

Every semester, there's a different instruction set for CPE480. It's actually quite difficult to keep coming up with something different, interesting, and vaguely reasonable. This semester, it's something really tacky... yes, TACKY: the twin accumulator computer from Kentucky.

Of the many computers I've designed, this is one of the most unusual... but that doesn't mean it's bad. In fact, I think it's quite clever. It's a 16-bit machine, with 16-bit instructions, but it's actually a VLIW (Very Long Instruction Word) machine. Ok, 16 bits isn't really "very long" -- but it typically packs two instructions into each instruction word, so it's not hard to get two instructions executing per clock in compute-heavy code.

A TACKY Overview

The TACKY instruction set is very simple, but the "twin" aspect of it is a little strange. So, let's ignore that for now.

Everything is 16 bit: memory addresses, memory locations, registers, integers, floating-point values, and instruction words (although two instructions, each fitting into 8 bits, can be packed into one instruction word). Main memory is only addressed by 16-bit word addresses; adding 1 to an address gets you the next 16-bit word. In fact, what the C language calls int, short, char, and float are all the same size for TACKY (which is actually compliant with the C standard, except for the fact that modern versions want an int to have at least 32 bits). I'm sure that having only 16-bit memory addresses does not sound exciting to most of you, but 65,536 16-bit memory locations is 8X the main memory I had in my first computer. More importantly, having only 16 bits in a word makes it feasible to implement each ALU operation in as little as a single clock cycle -- and that includes the floating-point operations!
There are 8 registers, $0 thru $7. The strange thing about them is that they are sort-of 17 bits long each. What's the extra bit for? Well, TACKY's registers are typed. For example, there is only one add instruction that could mean either integer or floating-point addition. Whenever a register's value is written, the type of that register is set according to the result type specified in the table below. The table below specifies how this is handled for each instruction, but the general rule is that result type is determined by type of the accumulator value (there are a few exceptions). Note that bitwise operations essentially ignore the type marking; for example, bitwise operation on a float leaves the result still tagged as float.
There is this strange little prefix register. Basically, some instructions need a 16-bit operand. For those that do, the high 8 bits are taken from the prefix register. For example, to load the integer value 0x1234 into register $5, you would do the two-word sequence pre 0x12, ci8 $5,0x34. The only instructions using the prefix register for the top 8 bits of a value are cf8, ci8, jnz8, jp8, and jz8. You are also to implement macros for cf, ci, jnz, jp, and jz than emit the sequence of pre and the correponding 8-bit version of the instruction.
This instruction set is complete enough to allow it to be efficiently targeted by a relatively simple C compiler. More about that later....

Ok, not that we've gone over the other aspects, how does that "twin" stuff work? Well, there are two accumulators. As usual, you don't specify the accumulator explicitly; it is always implied as the first operand and destination. But how does that work with two accumulators? The answer is that the accumulator to be used is specified by the slot the instruction occupies within the instruction word. Let's take a simple example:

add $5, mul $6

Because the add $5 is in "slot 0" of the instruction word, this instruction is really doing $0 = $0 + $5. Similarly, mul $6 is really doing $1 = $1 + $6. These paired instructions would be expected to execute simultaneously... which does suggest a potential problem. Suppose the pair is:

add $5, a2r $0

Both of those instructions want to write into the same register, $0. Because the operations are supposed to happen simultaneously, the result would be undefined... which is a polite way of saying that both instructions within an instruction word should never write into the same register. You don't need to have your assembler detect this and flag an error, but be careful you don't accidentally do this when testing one of your processor designs later in this course.

Ok, that all seems simple enough. However, what happens if you don't have two instructions to pack into one instruction word? Well, that's easy enough: here's a funny-looking pair of null operations:

r2a $0, r2a $1

The TACKY Instruction Set

Because TACKY is actually a two-wide VLIW architecture, the instruction encoding is a bit strange. Each operation is nominally 8 bits, but an instruction word is 16 bits. Some single instructions take an entire 16-bit word by themself. In other cases, two instructions can be packed side-by-side within an instruction.

Instruction Description Functionality Result Type Pack

a2r $r Copy acc to register, copy type $r = $acc typeof(acc) Field acc

add $r Typeof(acc) add register to acc $acc += $r typeof(acc) Field acc

and $r Bitwise AND register to acc $acc = ($acc & $r) typeof(acc) Field acc

cf8 $r,imm8 Load {pre, imm8} into reg $r = {pre, imm8} float Span 0,1

ci8 $r,imm8 Load {pre, imm8} into reg $r = {pre, imm8} int Span 0,1

cvt $r Convert int to float or float to int $acc = ((oppositetypeof($r)) $r) oppositetypeof(r) Field acc

div $r Typeof(acc) divide acc by register $acc /= $r typeof(acc) Field acc

jnz8 $r,imm8 Jump to {pre, imm8} if r is not 0 if ($r!=0) pc = {pre, imm8} Span 0,1

jp8 imm8 Jump to {pre, imm8} pc = {pre, imm8} Span 0,1

jr $r Jump to register (int) pc = $r Either 0,1

jz8 $r,imm8 Jump to {pre, imm8} if r is 0 if ($r==0) pc = {pre, imm8} Span 0,1

lf $r Load float from memory into reg $r = memory[$acc] float Field acc

li $r Load int from memory into reg $r = memory[$acc] int Field acc

mul $r Typeof(acc) multiply acc by register $acc *= $r typeof(acc) Field acc

not $r Bitwise NOT register to acc $acc = (~$r) typeof(acc) Field acc

or $r Bitwise OR register to acc $acc = ($acc | $r) typeof(acc) Field acc

pre imm8 Load 8-bit prefix register pre = imm8 Span 0,1

r2a $r Copy register into acc, copy type $acc = $r typeof(r) Field acc

sh $r Typeof(acc) shift left/right by register $acc = shift($acc,$r) where $r holds an int typeof(acc) Field acc

slt $r Typeof(acc) set acc less than register $acc = ($acc<$r) int Field acc

st $r Store acc into memory[register] memory[$r] = $acc Field acc

sub $r Typeof(acc) subtract register from acc $acc -= $r typeof(acc) Field acc

sys imm8 System call system(imm8) Span 0,1

xor $r Bitwise XOR register to acc $acc = ($acc ^ $r) typeof(acc) Field acc

Instruction	Description	Functionality	Result Type	Pack
`a2r $r`	Copy acc to register, copy type	`$r = $acc`	typeof(acc)	Field acc
`add $r`	Typeof(acc) add register to acc	`$acc += $r`	typeof(acc)	Field acc
`and $r`	Bitwise AND register to acc	`$acc = ($acc & $r)`	typeof(acc)	Field acc
`cf8 $r,imm8`	Load {pre, imm8} into reg	`$r = {pre, imm8}`	float	Span 0,1
`ci8 $r,imm8`	Load {pre, imm8} into reg	`$r = {pre, imm8}`	int	Span 0,1
`cvt $r`	Convert int to float or float to int	`$acc = ((oppositetypeof($r)) $r)`	oppositetypeof(r)	Field acc
`div $r`	Typeof(acc) divide acc by register	`$acc /= $r`	typeof(acc)	Field acc
`jnz8 $r,imm8`	Jump to {pre, imm8} if r is not 0	`if ($r!=0) pc = {pre, imm8}`		Span 0,1
`jp8 imm8`	Jump to {pre, imm8}	`pc = {pre, imm8}`		Span 0,1
`jr $r`	Jump to register (int)	`pc = $r`		Either 0,1
`jz8 $r,imm8`	Jump to {pre, imm8} if r is 0	`if ($r==0) pc = {pre, imm8}`		Span 0,1
`lf $r`	Load float from memory into reg	`$r = memory[$acc]`	float	Field acc
`li $r`	Load int from memory into reg	`$r = memory[$acc]`	int	Field acc
`mul $r`	Typeof(acc) multiply acc by register	`$acc *= $r`	typeof(acc)	Field acc
`not $r`	Bitwise NOT register to acc	`$acc = (~$r)`	typeof(acc)	Field acc
`or $r`	Bitwise OR register to acc	`$acc = ($acc \| $r)`	typeof(acc)	Field acc
`pre imm8`	Load 8-bit prefix register	`pre = imm8`		Span 0,1
`r2a $r`	Copy register into acc, copy type	`$acc = $r`	typeof(r)	Field acc
`sh $r`	Typeof(acc) shift left/right by register	`$acc = shift($acc,$r)` where `$r` holds an int	typeof(acc)	Field acc
`slt $r`	Typeof(acc) set acc less than register	`$acc = ($acc<$r)`	int	Field acc
`st $r`	Store acc into memory[register]	`memory[$r] = $acc`		Field acc
`sub $r`	Typeof(acc) subtract register from acc	`$acc -= $r`	typeof(acc)	Field acc
`sys imm8`	System call	`system(imm8)`		Span 0,1
`xor $r`	Bitwise XOR register to acc	`$acc = ($acc ^ $r)`	typeof(acc)	Field acc

Macro Description Functionality Result Type Pack

cf $r,imm16 Constant float Sequence of pre, cf8 float Span 0,1

ci $r,imm16 Constant int Sequence of pre, ci8 int Span 0,1

jnz $r,addr Jump to addr if r is not 0 Sequence of pre, jnz8 Span 0,1

jp addr Jump to addr Sequence of pre, jp8 Span 0,1

jz $r,addr Jump to addr if r is 0 Sequence of pre, jz8 Span 0,1

Macro	Description	Functionality	Result Type	Pack
`cf $r,imm16`	Constant float	Sequence of pre, cf8	float	Span 0,1
`ci $r,imm16`	Constant int	Sequence of pre, ci8	int	Span 0,1
`jnz $r,addr`	Jump to addr if r is not 0	Sequence of pre, jnz8		Span 0,1
`jp addr`	Jump to addr	Sequence of pre, jp8		Span 0,1
`jz $r,addr`	Jump to addr if r is 0	Sequence of pre, jz8		Span 0,1

The TACKY Registers

There are just 8 registers... which isn't a lot, so we'll try not to waste them. They all have names as well as numbers, and either can be used interchangeably; $r3 and $(4-1) would be treated identically. Perhaps the best way to give both is the following specification (formatted as an AIK specification):

.const {r0	r1	r2	r3	r4	ra	rv	sp}

The registers that have special meanings are:

Register Number Register Name Use

$0 $r0 accumulator for slot 0 instructions

$1 $r1 accumulator for slot 1 instructions

$5 $ra return address for simple functions

$6 $rv return value

$7 $sp stack pointer (there is no frame pointer)

Register Number	Register Name	Use
`$0`	`$r0`	accumulator for slot 0 instructions
`$1`	`$r1`	accumulator for slot 1 instructions
`$5`	`$ra`	return address for simple functions
`$6`	`$rv`	return value
`$7`	`$sp`	stack pointer (there is no frame pointer)

The TACKY Floating Point

You might be surprised, or perhaps a bit scared, to learn that TACKY supports float arithmetic. Yes, there is floating-point hardware. IEEE 754-2008 floating point typically uses at least 32 bits to represent a value, whereas here we get 16 bits. Actually, 16-bit floats are not a new invention; they are sometimes called half precision. An IEEE single float normally has a sign bit, an 8-bit exponent, and a 24-bit mantissa magnitude stored in just 23 bits. So, how many bits is each of those things in a 16-bit float? Well, IEEE suggests 1+5+11 bits. However, we sacrifice IEEE compliance to get a more useful dynamic range... we were always going to ignore denorms, infinities, NaNs, and rounding modes anyway. ;-)

The 16-bit float format used in TACKY basically looks like the first 16 bits of an IEEE 32-bit float. That means 1 sign bit, 8 exponent bits, and 8 mantissa magnitude bits. It's not a huge change, but sacrificing some precision buys us a much larger dynamic range and means, for example, that mul only needs to do an 8x8 bit multiply -- which credibly can be implemented within a single clock cycle without a rediculous amount of circuitry.

Advanced Computer Architecture.