Small-board computers
Think Small
The Raspberry Pi [1], a simple, small, but complete system for about $35, has caught the attention of the world. Some people think it's just a cute system not suitable for serious applications, whereas others think the Raspberry Pi, or at least the same type of system, could be the next wave of HPC.
The Raspberry Pi was designed to excite the imagination of children in the field of computer science and electronics (see the "Rasp Pi Specs" box). The credit-card sized single-board computer (SBC) [2] has the basic components of any server, typically with everything on a single circuit board and few to no expansion slots built-in. Today's SBCs typically (but not always) come with the CPU, as well as the memory and other additions, soldered onto the board.
Although SBCs have been around awhile, it was the Raspberry Pi that really got people excited, and many new SBCs have come out since then. This renewed call created a market that was quickly dominated by ARM processors [4]. It has also helped push Intel to develop lower power versions of x86 processors. Intel now has the single-core Quark processor [5], which has been incorporated into the Intel Galileo SoC [6]. Under load it uses about 15W, which is still pretty low. The price is a bit more than the Rasp Pi, costing about $60-$65 (EUR67-70), but it is pretty close.
Intel has also been developing the Intel Atom processor for a range of systems, including SoC's (System on chip) and a small-factor family of systems called Intel NUC (Next Unit of Computing) [7]. Many of these processors use less than 10W and come in dual-core and quad-core versions. Intel even has an eight-core server SoC called Avoton [8] that uses only 20W under load. These systems are more expensive than a Raspberry Pi, but they run faster and use just a little more power.
Raspberry Pis have been used in all sorts of projects from simple web servers [9], to robotics [10], to underwater ROVs [11], and yes, even clusters [12]. The appeal of the Rasp Pi is that it is cheap, uses almost no power, is easy to program, and is really small.
Wonderful World of SBCs
A plethora of SBCs cover a wide range of systems and price ranges. A couple of good articles give a summary of SBCs running Linux [13] and compare various systems [14].
For the sake of completeness, I've listed a range of SBC systems in Table 1 that might be of interest. The majority are 32 bit, particularly if they are ARM based, but some are x86 compatible. Both AMD and Intel have 64-bit SBCs.
Tabelle 1: Single-Board Computer Specifications
Name |
OS |
Processor |
Cores |
GPU |
Memory |
Ports |
Power |
Price |
URL |
|
---|---|---|---|---|---|---|---|---|---|---|
A10-OLinuxXino-Lime |
Linux |
All-winner A10 processor |
Single ARM Cortex-A8 @1GHz |
Mali-400 |
512MB DDR3 |
SATA connector, 2 USB, Fast Ethernet, USB OTG, HDMI |
1.9W |
$44/EUR 30 |
||
A20-OLinuxXino-Micro |
Linux |
Allwinner A20 |
Dual ARM Cortex-A7 @1GHz |
Mali-400 |
1GB DDR3 |
SATA connector, USB, USB OTG, Fast Ethernet, HDMI, VGA |
3W |
$67/EUR 55 |
||
Arndale Octa Board |
Android 4.3 Jelly Bean |
Samsung Exynos 5420 Octa |
Quad ARM Cortex-A15 (32KB instruction/32KB data/2MB L2) @1.8GHz, Quad-core ARM Cortex-A7 (32KB/32KB/512KB) @1.3GHz |
Mali T-628 MP6 |
3GB LPDDR3e RAM (14.9GBps memory BW) |
Fast Ethernet, USB 2.0, USB 3.0, HDMI |
3-4W |
$199 |
||
Creator CI20 |
Android 4.4 KitKat, Linux |
Ingenic JZ4780 |
Dual XBurst MPIS32 @1.2GHz (32KB/32KB/512KB) |
PowerVR SGX540 |
1GB DDR3, 4GB flash |
Fast Ethernet, 2 USB, USB OTG, HDMI |
4W |
$65/EUR50 |
||
Cubieboard2 |
Android 4.2, Cubieez Linux |
AllWinner A20 |
Dual ARM Cortex-A7 @1GHz (512KB L2) |
Mali-400 |
1GB DDR3 @480MHz, 4GB NAND flash |
Fast Ethernet, 2 USB, 1 SATA, HDMI, IR |
5-6W |
$59 |
||
CuBox-i4Pro |
Android 4.3/4.4, Linux |
Freescale i.MX6 ARM |
Quad ARM Cortex-A9 @1GHz |
Vivante GC2000 |
2GB DDR3 |
GigE Ethernet, eSATA II, 2 USB, MicroUSB, HDMI, IR |
3W |
$139.99 |
||
Nvidia Jetson TK1 |
Linux 3.10.40 |
Nvidia Tegra K1 |
4-Plus-1 Quad ARM Cortex-A15 @2.3GHz |
192-core Nvidia Kepler GK20A @950MHz (128KB L2) for 365GFLOPS with FP16 and FP32 |
2GB DDR3L (930MHz memory clock, 14.9GBps bandwidth), 16GB NAND flash (eMMC) |
SATA half-mini-PCIe, USB 2.0, USB 3.0, GigE Ethernet, HDMI, RS-232, GigE LAN |
7-10W |
$192 |
||
Odroid-XU3 |
Android 4.4, Linux |
Samsung Exynos5422 |
Quad ARM Cortex-A15 @2.0GHz (32KB/32KB/2MB), Quad ARM Cortex-A7 @1.4GHz (32KB/32KB/512KB) |
Mali-T628 MP6 |
2GB LPDDR3 RAM (14.9GBps bandwidth) |
eMMC5.0 HS400 flash, Fast Ethernet (optional USB3.0 to GigE adapter), 4 USB 2.0, USB 3.0, USB 3.0 OTG, micro-HDMI, DisplayPort, MicroSD |
10-20W |
$179.00/EUR119 |
||
Gizmo 2 |
Linux, Windows Embedded 8 |
AMD G-series GX210HA |
Dual x86 @1GHz (1MB shared L2) for 85GFLOPS |
AMD Radeon HD 8210E discrete-class graphics (300MHz) |
1GB DDR3 |
GigE Ethernet, 2 USB 2.0, 2 USB 3.0, HDMI |
9W |
$199/EUR160 |
||
Intel Galileo |
Yocto Linux, VxWorks (RTOS), Windows |
Intel Quark X1000 |
Single 32-bit Intel Pentium (x86) @400MHz |
Integrated Intel GPU |
256 MB DDR3, 512KB embedded SRAM, 8MB NOR flash |
Fast Ethernet, mPCIe, USB 2.0, MicroUSB 2.0, MicroSD, other ports provided by add-on shields |
2.5-4W |
$60/EUR57 |
||
MinnowBoard Max |
Linux, Windows 8.1 |
Intel E3825 |
Dual x86 ATOM, 64-bit @1.33GHz (1MB L2) |
Intel Graphics @533MHz |
2GB DDR3L |
GigE Ethernet, USB 2.0, USB 3.0, SATA2, MicroSD |
6W+ |
$145/EUR149 |
||
ODROID-C1 |
Android 4.4 KitKat, Linux |
Amlogic S805 |
Quad ARM Cortex-A5 @1.5GHz |
Mali-450 MP2 @600MHz |
1GB DDR3 |
GigE Ethernet, 4 USB 2.0, USB OTG, micro-HDMI, IR |
10W |
$35/EUR44 |
||
Parallella |
Linux |
Xilinx Zynq-7020 or -7010 |
Dual ARM Cortex-A9 @667MHz plus FPGA, 16-core Epiphany RISC coprocessor (32-bit) |
1GB DDR3 |
GigE Ethernet, USB 2.0, micro-HDMI |
1.9W + 2W |
$126/EUR119 |
|||
pcDuino3 Nano |
Android 4.2, Linux |
Allwinner A20 |
Dual ARM Cortex-A7 @1GHz |
Mali-400 |
1GB DRAM, 4GB flash |
GigE Ethernet, SATA, 2 USB 2.0, USB OTG, HDMI, IR |
10W |
$40 |
||
Udoo Quad |
Android, Linux |
Freescale i.MX6Quad |
Quad ARM Cortex-A9 @1GHz |
Vivante GC 2000 + Vivante GC 355 + Vivante GC 320 |
1GB DDR3 |
GigE Ethernet, SATA, 2 USB 2.0, MicroUSB serial, USB OTG, HDMI |
3.7W idle |
$135/EUR99 |
Processors now range from single, to dual, to quad, and even to octo-core processors, which comprise two different quad-cores – the big.LITTLE architecture [15]. All of the SBCs have GPUs for graphics, but Nvidia's Jetson TK1 system has 192 CUDA cores, so you can run HPC applications on the GPU as well as on the CPUs. The Parallela board currently has 16 cores connected by a mesh topology on-board that can be used to run parallel applications.
Ethernet is found throughout, including some Gigabit Ethernet, and all SBCs boot from some sort of flash storage (i.e., the root disk of the system), such as an eMMC card or an SD card, and have USB ports of some type, so you can plug an input device into the system or add external storage. Some SBCs have ports for SATA devices, and some have a mini-PCIe (mPCIe) slot, so you can add Gigabit Ethernet cards or even small SSDs.
The amount of memory varies widely from a low of about 256MB per core up to 1GB per core. The low memory capacity is attributable to the 32-bit processors, which have limited total addressable memory. Some of the systems also have a pretty good memory bandwidth of about 14.9GBps.
Two factors common across all of these SBCs are important: the low cost ($35-$192 per SBC in this list) and the low power usage, with the greatest being less than 20W.
Low Cost
Although price isn't always the most important objective in designing and building clusters, it's not to be ignored, especially when you are building your own system for personal use, or for education or research. When you factor in having to learn how to build and use clusters, buying conventional new or older used hardware often isn't an option. Consequently, inexpensive SBC-class hardware could be your best option. Moreover, a large number of inexpensive systems might turn out to be faster than a single more expensive system.
An SBC setup starting with one system, a simple network switch, and a boot flash card is fairly inexpensive (assuming you have a monitor and keyboard/mouse). For example, you can get a quad-core 32-bit ARM system with all of this for around $90 ($35 for an ODROID-C1, $20 for a simple GigE switch, and a few more dollars for an SD card, powered USB hub, power supply, etc.). Such a setup would allow you to start learning about parallel applications using threads (OpenMP) and MPI programming. After writing some new code or porting existing code, you could then over time inexpensively add more SBCs and continue working on the code.
Although SBCs might offer low performance at low cost with limited memory, which can limit the size of problems that can be addressed, an SBC cluster can minimize the money spent on hardware until a proof of concept or theory is proven or at least demonstrated.
Low Power
Power usage also can be a roadblock in building a cluster. When processors use 80W+ and you need to add memory and possibly a network card, you start worrying about power consumption. These SBCs use only 10W or less when under full load. I think I have Christmas tree ornaments that use more power than that. If I had 10 of the SBCs running under full load, the cluster would only use about 100W of total power, the equivalent of one "low-power" system.
Lower power consumption can also be important for various scenarios when using a cluster. For example, you might not have a great deal of extra power at your disposal for running higher powered systems, with no capacity to add more circuits. School environments, for example, typically don't have extra circuits for building clusters, especially if the building was constructed pre-1980s.
Power issues are also magnified in other areas of the world. Not everyone has access to stable, inexpensive power. Being able to build a cluster with a total power draw that is less than 50W is pretty advantageous. You can get a 50W solar panel for around $100. If you have enough sun, you could run a four-node cluster (you might need a bit more with an older CRT monitor). The total price for four nodes with quad cores, a GigE switch, the solar panel, cables, and various other bits needed, would come to about $450.
Just a Toy
One probable argument against using SBCs in HPC is that these low-power, low-cost, low-performance units are toys and, from a price-performance perspective, are less cost effective than higher powered systems with better price-performance ratios. However, the most favorable price-performance is not the sole measure of success. For example, each node of the SBC cluster shown in Figure 1 was built by a high school student; then, the nodes were assembled so that the students could learn about clusters.
Right now, the appeal of SBCs is that they have all of the components of production systems at a very low cost in money and power. They are not really designed for performance, although some applications run very well on the systems. In fact, some people say that SBCs are the future of HPC, or at least a component in this future. Given the lower performance of the systems, this might not seem likely. However, the announcement by Nvidia of a new 64-bit SBC might change that perception.
64-Bit SBCs Are Coming
The Tegra X1 chip has eight ARM cores, all 64-bit, in a big.LITTLE architecture along with a very powerful GPU that can be used for computation (see the "Tegra X1 Specs" box). Memory bandwidth has been raised to 25.6GBps to "feed the beast." All of this only uses about 10W of power, so the performance per watt metric for this system is pretty remarkable.
Using the pricing of the Jetson TK1 SBC as a guide, an X1 SBC might cost $256 ($1/GPU core). Although an SBC based on the X1 would be amazing, a price of $256 would put it out of the reach of many people (me included), so I hope it's cheaper.
Summary
The Raspberry Pi has re-invigorated the SBC market, and now a whole range of SBCs are available. Because they are inexpensive, use very little power, and are very small, some groups, such as the Mont-Blanc project [17] at the Barcelona Supercomputing Center, are trying to use them to build very large systems.
For small clusters, clusters for education, or clusters that are seriously constrained by price or power, you can make a case for the use of the wide variety of SBCs on the market. Moreover, Nvidia just announced a very cool 64-bit chip that could form the basis of a very nice SBC.
I recommend you keep your eye on these SBCs. Although they might not meet your HPC needs now, they could in the future. Plus, tinkering with these small computers is lots of fun.