As if you didn't know, this document is always under construction....
Don't see your question above? You can send email to
[email protected].
Maybe I'll even answer. ;-)
We pronounce HAK like "Hack" -- and hack is definitely the right word to describe this.
HAK stands for Half-powered Athlon cluster in Kentucky.
Why "half-powered?" Well, because HAK is built using stuff we had laying around... and over time, the components we've lost the most of are power supplies. Yeah, we could have bought more power supplies, but instead we decided to try sharing one power supply for each two nodes. We figured that might even save us a few percent on power consumption because switching power supplies often are more efficient when loaded and one node was only about 1/3 of the stated target load. Initial power measurements on a test node pair seemed to confirm a few percent savings, but then we switched meters and it looked like sharing cost about 15% more power...? A web search found that power consumption for this type of sharing is poorly understood and hotly debated. Thus, getting experimental understanding of this aspect of power consumption became another reason to build HAK.
HAK is green in several ways:
We don't know yet... and we don't care too much. This system is intended as a network design testbed, so the nodes just have to be capable enough to run real applications to stress and measure the network performance.
The nodes mix Thoroughbred 2600+ and Barton 2500+ Athlon XP processors. The T'bred processors run at 2.083 GHz while the (newer) Bartons run at 1.883 GHz. Theoretical peak performance should be 761 single-precision GFLOPS (4 FLOPS/clock) if the nodes are evenly split between the two processor versions. However, we have more T'breds than Bartons laying around, so the number may be slightly higher. That said, we will use as many Bartons as we can because they seem to draw slightly less power.
If you know us, you know we have a history of setting new price/performance records in high-performance computing. Although we view HAK as a "hack" to create a network testbed, and did not build it to optimize price/performance, it's actually not bad.
It could be argued that the bulk of HAK's components have a negative cost because one would have had to pay to get the parts hauled away. At the University of Kentucky, that's called the "Waste Recharge" cost, and is budgeted as 0.99% of the purchase price -- between $300 and $400 for the stuff reused in HAK. The Y-cables for sharing the power supplies cost about $400 and less than $100 was spent on food for the build party. Thus, it can be argued that HAK cost less than $100, or less than $1 per node!
Generously overestimating the cost as $500, HAK would come in at around $0.66/GFLOPS. Including the "Waste Recharge" savings, it is more like $0.13/GFLOPS. Either way, HAK's easily the cheapest 96-node OpenMPI cluster ever built. It actually beats the GPU-laden NAK in price/performance!
Don't know yet.
Although we have occassionally used such applications as burn-in tests, these are particularly pointless tests of a machine built to examine network topology issues.
Each node runs Linux. It's very straightforward except in that the nodes are diskless and the network requires the FNN drivers.
HAK's configuration is:
The network hardware list is deliberately a bit vague because it will vary significantly over time -- that's the main reason why we built HAK. As a minimum, we will be wiring the system network as a tree, SFNN, and FFNN,
Of course, the power-supply sharing is also an obvious and unusual characteristic. We did that because we were short power supplies and didn't want to invest more than necessary in building a cluster with old node hardware. It is worth noting, however, that we paid $7.50 per Y cable and the cheapest power supplies we found were actually only $7 each! Honestly, we just don't have much faith in a $7 power supply and that still would have left us short on spare power supplies.
It also should be noted that HAK is deliberately sparse on the racks... leaving space to add nodes as power allows. In fact, the shelving could easily fit at least 128 nodes. We'll add nodes as power permits when we know more about the power consumption for these half-powered nodes....
Well, we had them. However, we actually have a ton (well, literally multiple tons) of other options, including older Athlon, Pentium 4, Athlon MP, etc. Athlon XP processors are a fairly easy "best in class" choice for maximizing node count with good performance among the older CPUs we have. In fact, the 2600+ had an exceptionally long production lifespan as an AMD product, first being sold as an Athlon XP and later being revived as a Sempron.
As we noted for NAK, Athlon XP processors are not really slower than modern cores. Head-to-head comparisons of a single thread running on the Athlon XP vs. i7 are typically showing that the Athlon XP is more than 2X faster! Put another way, the quad-core i7 is about 3X faster than the Athlon XP, but it needs 8 threads to get that speed. There are a few applications where the i7 blows away the Athlon XP, but that seems to be due to memory accesses. For example, the "smart remove selection" GIMP plug-in runs an order of magnitude faster on the i7, but it does deliberately random access to pixels within an image -- all/most of an image fits in the i7's 8MB cache, while virtually every access is a miss for the Athlon XP's 256KB cache. Fortunately, most supercomputing application codes are written to work within a smaller cache.
Of course they should. ;-) There are three memory slots, but we don't have enough PC2700 sticks to put even two in each node. In fact, we had to take NAK nodes down from 1GB to 512MB in each node to build HAK. Feel free to send us any PC2700 sticks you have laying around....
We need lots of nodes and have no power to spare... thus, local disks would be a bad idea. The same is true of video cards, etc.
It should be noted that HAK does have an administrative boot server (minimal "head node") with a disk, status display, keyboard, and mouse.
Both KASY0 and KFC4 used 100Mb/s Ethernet, so we had plenty of that hardware. We don't have as much GigE hardware, so we couldn't build as many interesting topologies. Further, GigE performance would be more limited by the PCI bus in the nodes, hiding the impact of toplogy choice -- and comparing topologies is why we built this system.
Elsewhere.
Good question. Yes, there are three power switches per node pair.
The one on the back of the power supply has to be "on" for either node to be able to turn on. The front-panel power switches are really soft switches, so there is on the order of 10W being drawn even when both nodes are turned off.
Turning on the rear power switch and both front switches is obviously necessary to turn-on both nodes, but it is not always sufficient. The issue is the power-on sensing. The second node in a pair might not sense that it should boot until the front reset button is pressed. Of course, this is not a serious problem unless you're sitting there thinking the node is dead because it doesn't boot when you hit the power switch. ;-)
HAK is powered using three ceiling drops with two 20A circuits each, for a total of 120A of 120VAC. Actually, the cluster power in 672 FPAT is roughly equally divided between NAK and HAK... which basically reflects the fact that adding GPUs multiplies the power per node by about 1.5X. There is a 5-ton air conditioner dedicated to this machine room.
You're not serious, are you? Unlike our other builds, HAK is something that people would only undertake out of necessity, as an educational experience, or because they are "true believers" in the project (or perhaps if they are Freegans?). The old equipment triage process is slow and painful... slow because lots of stuff doesn't work and everything must be broken down to components to repackage as we need it, painful because there are surprisingly many computer components that have thin sheet metal pieces with sharp edges. Seriously, this is not a job you could pay people to do. That said, students volunteered to do it and stuck with it longer than we asked. Go figure.
Incidentally, we do feed helpers well. In addition to pizza, we provided a variety of nicer goodies from a wholesale club (e.g., mini cheesecakes). Despite that, I had to keep reminding helpers to take some food. Clearly, they came to be involved in the build, not because we bribed them with food.
Although HAK has a physical layout very similar to that of KASY0, it doesn't photograph well because it is on three independently positionable pairs of racks crammed into the back of 672 FPAT. The overall shape is deformed compared to KASY0 mostly because of where the power drops come from the ceiling and connect to the rack pair power strip panels... HAK uses only 3 drops in a row, whereas KASY0 used 5 that were in more of an "L" shape. That said, the node pair structure is probably what you really want to see, and that's shown in this photo.
To build a node pair, we start with two nearly identical nodes. The only differences between them are that one has a 2500+ processor and a power supply whereas the other has a 2600+ processor and a hole where the power supply should be.
The access side panels of both cases are removed and the 2600+ case is flipped upside-down, so that the front panels are on the same side and the open sides are next to each other. The power connector is unplugged from the 2500+ motherboard, plugged into a Y cable, and connected to both the 2500+ and 2600+ motherboards. The boxes are pushed together and a half-width strip of metalic duct tape is run down the seam between them in the back, making them behave like one hinged case -- as seen on the left side of the photo.
In isolation, it might be fine to leave a hole where the missing power supply would have been, but we are racking these systems in a configuration where nodes blow hot air at each other (which is the right way to do it; you don't want one node sucking in air pre-heated by another). Thus, leaving the hole would cause a highly undesirable airflow path to be taken... so, we tape over the hole with a couple of short lengths of the same metal duct tape. That leaves the back of a node pair looking as shown on the right side of the photo.
Ok, that description of the construction process doesn't bother mentioning all the collection and triage of parts from old nodes (over 200 old nodes were used as part sources), nor the hardware/BIOS configuration, testing, diagnosis, and repair processes. Let's just say this was the slowest build we've ever done. However, the slowness was not caused by the funky sharing of power supplies... in fact, handling the boxes sharing power supplies is actually easier than for individual nodes. Why? Once they're taped together, there is half as much external power wiring and the NICs are spaced out vertically (because one case is inverted) giving better wiring access.