Tools SBCs for HPC Lead image: Lead Image © Author, 123RF.com
Lead Image © Author, 123RF.com
 

Small-board computers

Think Small

Single-board computers, such as the Raspberry Pi, are very low cost and low power, yet are complete systems suitable for personal and educational projects. But are they HPC-worthy? By Jeff Layton

The Raspberry Pi [1], a simple, small, but complete system for about $35, has caught the attention of the world. Some people think it's just a cute system not suitable for serious applications, whereas others think the Raspberry Pi, or at least the same type of system, could be the next wave of HPC.

The Raspberry Pi was designed to excite the imagination of children in the field of computer science and electronics (see the "Rasp Pi Specs" box). The credit-card sized single-board computer (SBC) [2] has the basic components of any server, typically with everything on a single circuit board and few to no expansion slots built-in. Today's SBCs typically (but not always) come with the CPU, as well as the memory and other additions, soldered onto the board.

Although SBCs have been around awhile, it was the Raspberry Pi that really got people excited, and many new SBCs have come out since then. This renewed call created a market that was quickly dominated by ARM processors [4]. It has also helped push Intel to develop lower power versions of x86 processors. Intel now has the single-core Quark processor [5], which has been incorporated into the Intel Galileo SoC [6]. Under load it uses about 15W, which is still pretty low. The price is a bit more than the Rasp Pi, costing about $60-$65 (EUR67-70), but it is pretty close.

Intel has also been developing the Intel Atom processor for a range of systems, including SoC's (System on chip) and a small-factor family of systems called Intel NUC (Next Unit of Computing) [7]. Many of these processors use less than 10W and come in dual-core and quad-core versions. Intel even has an eight-core server SoC called Avoton [8] that uses only 20W under load. These systems are more expensive than a Raspberry Pi, but they run faster and use just a little more power.

Raspberry Pis have been used in all sorts of projects from simple web servers [9], to robotics [10], to underwater ROVs [11], and yes, even clusters [12]. The appeal of the Rasp Pi is that it is cheap, uses almost no power, is easy to program, and is really small.

Wonderful World of SBCs

A plethora of SBCs cover a wide range of systems and price ranges. A couple of good articles give a summary of SBCs running Linux [13] and compare various systems [14].

For the sake of completeness, I've listed a range of SBC systems in Table 1 that might be of interest. The majority are 32 bit, particularly if they are ARM based, but some are x86 compatible. Both AMD and Intel have 64-bit SBCs.

Tabelle 1: Single-Board Computer Specifications

Name

OS

Processor

Cores

GPU

Memory

Ports

Power

Price

URL

A10-OLinuxXino-Lime

Linux

All-winner A10 processor

Single ARM Cortex-A8 @1GHz

Mali-400

512MB DDR3

SATA connector, 2 USB, Fast Ethernet, USB OTG, HDMI

1.9W

$44/EUR 30

https://www.olimex.com/wiki/A10-OLinuXino-LIME

A20-OLinuxXino-Micro

Linux

Allwinner A20

Dual ARM Cortex-A7 @1GHz

Mali-400

1GB DDR3

SATA connector, USB, USB OTG, Fast Ethernet, HDMI, VGA

3W

$67/EUR 55

https://www.olimex.com/wiki/A20-OLinuXino-MICRO

Arndale Octa Board

Android 4.3 Jelly Bean

Samsung Exynos 5420 Octa

Quad ARM Cortex-A15 (32KB instruction/32KB data/2MB L2) @1.8GHz, Quad-core ARM Cortex-A7 (32KB/32KB/512KB) @1.3GHz

Mali T-628 MP6

3GB LPDDR3e RAM (14.9GBps memory BW)

Fast Ethernet, USB 2.0, USB 3.0, HDMI

3-4W

$199

http://www.arndaleboard.org/wiki/

Creator CI20

Android 4.4 KitKat, Linux

Ingenic JZ4780

Dual XBurst MPIS32 @1.2GHz (32KB/32KB/512KB)

PowerVR SGX540

1GB DDR3, 4GB flash

Fast Ethernet, 2 USB, USB OTG, HDMI

4W

$65/EUR50

http://www.elinux.org/MIPS_Creator_CI20

Cubieboard2

Android 4.2, Cubieez Linux

AllWinner A20

Dual ARM Cortex-A7 @1GHz (512KB L2)

Mali-400

1GB DDR3 @480MHz, 4GB NAND flash

Fast Ethernet, 2 USB, 1 SATA, HDMI, IR

5-6W

$59

http://cubieboard.org/2013/06/19/cubieboard2-is-here/

CuBox-i4Pro

Android 4.3/4.4, Linux

Freescale i.MX6 ARM

Quad ARM Cortex-A9 @1GHz

Vivante GC2000

2GB DDR3

GigE Ethernet, eSATA II, 2 USB, MicroUSB, HDMI, IR

3W

$139.99

http://www.solid-run.com/products/

Nvidia Jetson TK1

Linux 3.10.40

Nvidia Tegra K1

4-Plus-1 Quad ARM Cortex-A15 @2.3GHz

192-core Nvidia Kepler GK20A @950MHz (128KB L2) for 365GFLOPS with FP16 and FP32

2GB DDR3L (930MHz memory clock, 14.9GBps bandwidth), 16GB NAND flash (eMMC)

SATA half-mini-PCIe, USB 2.0, USB 3.0, GigE Ethernet, HDMI, RS-232, GigE LAN

7-10W

$192

https://developer.nvidia.com/jetson-tk1

Odroid-XU3

Android 4.4, Linux

Samsung Exynos5422

Quad ARM Cortex-A15 @2.0GHz (32KB/32KB/2MB), Quad ARM Cortex-A7 @1.4GHz (32KB/32KB/512KB)

Mali-T628 MP6

2GB LPDDR3 RAM (14.9GBps bandwidth)

eMMC5.0 HS400 flash, Fast Ethernet (optional USB3.0 to GigE adapter), 4 USB 2.0, USB 3.0, USB 3.0 OTG, micro-HDMI, DisplayPort, MicroSD

10-20W

$179.00/EUR119

http://www.hardkernel.com/

Gizmo 2

Linux, Windows Embedded 8

AMD G-series GX210HA

Dual x86 @1GHz (1MB shared L2) for 85GFLOPS

AMD Radeon HD 8210E discrete-class graphics (300MHz)

1GB DDR3

GigE Ethernet, 2 USB 2.0, 2 USB 3.0, HDMI

9W

$199/EUR160

http://www.gizmosphere.org/products/gizmo-2/

Intel Galileo

Yocto Linux, VxWorks (RTOS), Windows

Intel Quark X1000

Single 32-bit Intel Pentium (x86) @400MHz

Integrated Intel GPU

256 MB DDR3, 512KB embedded SRAM, 8MB NOR flash

Fast Ethernet, mPCIe, USB 2.0, MicroUSB 2.0, MicroSD, other ports provided by add-on shields

2.5-4W

$60/EUR57

https://www.sparkfun.com/products/12720

MinnowBoard Max

Linux, Windows 8.1

Intel E3825

Dual x86 ATOM, 64-bit @1.33GHz (1MB L2)

Intel Graphics @533MHz

2GB DDR3L

GigE Ethernet, USB 2.0, USB 3.0, SATA2, MicroSD

6W+

$145/EUR149

http://www.minnowboard.org/meet-minnowboard-max/

ODROID-C1

Android 4.4 KitKat, Linux

Amlogic S805

Quad ARM Cortex-A5 @1.5GHz

Mali-450 MP2 @600MHz

1GB DDR3

GigE Ethernet, 4 USB 2.0, USB OTG, micro-HDMI, IR

10W

$35/EUR44

http://www.hardkernel.com/

Parallella

Linux

Xilinx Zynq-7020 or -7010

Dual ARM Cortex-A9 @667MHz plus FPGA, 16-core Epiphany RISC coprocessor (32-bit)

1GB DDR3

GigE Ethernet, USB 2.0, micro-HDMI

1.9W + 2W

$126/EUR119

https://www.parallella.org/board/

pcDuino3 Nano

Android 4.2, Linux

Allwinner A20

Dual ARM Cortex-A7 @1GHz

Mali-400

1GB DRAM, 4GB flash

GigE Ethernet, SATA, 2 USB 2.0, USB OTG, HDMI, IR

10W

$40

http://store.linksprite.com/pcduino3-nano/

Udoo Quad

Android, Linux

Freescale i.MX6Quad

Quad ARM Cortex-A9 @1GHz

Vivante GC 2000 + Vivante GC 355 + Vivante GC 320

1GB DDR3

GigE Ethernet, SATA, 2 USB 2.0, MicroUSB serial, USB OTG, HDMI

3.7W idle

$135/EUR99

http://shop.udoo.org/usa/product/udoo-quad.html

Processors now range from single, to dual, to quad, and even to octo-core processors, which comprise two different quad-cores – the big.LITTLE architecture [15]. All of the SBCs have GPUs for graphics, but Nvidia's Jetson TK1 system has 192 CUDA cores, so you can run HPC applications on the GPU as well as on the CPUs. The Parallela board currently has 16 cores connected by a mesh topology on-board that can be used to run parallel applications.

Ethernet is found throughout, including some Gigabit Ethernet, and all SBCs boot from some sort of flash storage (i.e., the root disk of the system), such as an eMMC card or an SD card, and have USB ports of some type, so you can plug an input device into the system or add external storage. Some SBCs have ports for SATA devices, and some have a mini-PCIe (mPCIe) slot, so you can add Gigabit Ethernet cards or even small SSDs.

The amount of memory varies widely from a low of about 256MB per core up to 1GB per core. The low memory capacity is attributable to the 32-bit processors, which have limited total addressable memory. Some of the systems also have a pretty good memory bandwidth of about 14.9GBps.

Two factors common across all of these SBCs are important: the low cost ($35-$192 per SBC in this list) and the low power usage, with the greatest being less than 20W.

Low Cost

Although price isn't always the most important objective in designing and building clusters, it's not to be ignored, especially when you are building your own system for personal use, or for education or research. When you factor in having to learn how to build and use clusters, buying conventional new or older used hardware often isn't an option. Consequently, inexpensive SBC-class hardware could be your best option. Moreover, a large number of inexpensive systems might turn out to be faster than a single more expensive system.

An SBC setup starting with one system, a simple network switch, and a boot flash card is fairly inexpensive (assuming you have a monitor and keyboard/mouse). For example, you can get a quad-core 32-bit ARM system with all of this for around $90 ($35 for an ODROID-C1, $20 for a simple GigE switch, and a few more dollars for an SD card, powered USB hub, power supply, etc.). Such a setup would allow you to start learning about parallel applications using threads (OpenMP) and MPI programming. After writing some new code or porting existing code, you could then over time inexpensively add more SBCs and continue working on the code.

Although SBCs might offer low performance at low cost with limited memory, which can limit the size of problems that can be addressed, an SBC cluster can minimize the money spent on hardware until a proof of concept or theory is proven or at least demonstrated.

Low Power

Power usage also can be a roadblock in building a cluster. When processors use 80W+ and you need to add memory and possibly a network card, you start worrying about power consumption. These SBCs use only 10W or less when under full load. I think I have Christmas tree ornaments that use more power than that. If I had 10 of the SBCs running under full load, the cluster would only use about 100W of total power, the equivalent of one "low-power" system.

Lower power consumption can also be important for various scenarios when using a cluster. For example, you might not have a great deal of extra power at your disposal for running higher powered systems, with no capacity to add more circuits. School environments, for example, typically don't have extra circuits for building clusters, especially if the building was constructed pre-1980s.

Power issues are also magnified in other areas of the world. Not everyone has access to stable, inexpensive power. Being able to build a cluster with a total power draw that is less than 50W is pretty advantageous. You can get a 50W solar panel for around $100. If you have enough sun, you could run a four-node cluster (you might need a bit more with an older CRT monitor). The total price for four nodes with quad cores, a GigE switch, the solar panel, cables, and various other bits needed, would come to about $450.

Just a Toy

One probable argument against using SBCs in HPC is that these low-power, low-cost, low-performance units are toys and, from a price-performance perspective, are less cost effective than higher powered systems with better price-performance ratios. However, the most favorable price-performance is not the sole measure of success. For example, each node of the SBC cluster shown in Figure 1 was built by a high school student; then, the nodes were assembled so that the students could learn about clusters.

Raspberry Pi cluster at Louisiana State University.
Figure 1: Raspberry Pi cluster at Louisiana State University.

Right now, the appeal of SBCs is that they have all of the components of production systems at a very low cost in money and power. They are not really designed for performance, although some applications run very well on the systems. In fact, some people say that SBCs are the future of HPC, or at least a component in this future. Given the lower performance of the systems, this might not seem likely. However, the announcement by Nvidia of a new 64-bit SBC might change that perception.

64-Bit SBCs Are Coming

The Tegra X1 chip has eight ARM cores, all 64-bit, in a big.LITTLE architecture along with a very powerful GPU that can be used for computation (see the "Tegra X1 Specs" box). Memory bandwidth has been raised to 25.6GBps to "feed the beast." All of this only uses about 10W of power, so the performance per watt metric for this system is pretty remarkable.

Using the pricing of the Jetson TK1 SBC as a guide, an X1 SBC might cost $256 ($1/GPU core). Although an SBC based on the X1 would be amazing, a price of $256 would put it out of the reach of many people (me included), so I hope it's cheaper.

Summary

The Raspberry Pi has re-invigorated the SBC market, and now a whole range of SBCs are available. Because they are inexpensive, use very little power, and are very small, some groups, such as the Mont-Blanc project [17] at the Barcelona Supercomputing Center, are trying to use them to build very large systems.

For small clusters, clusters for education, or clusters that are seriously constrained by price or power, you can make a case for the use of the wide variety of SBCs on the market. Moreover, Nvidia just announced a very cool 64-bit chip that could form the basis of a very nice SBC.

I recommend you keep your eye on these SBCs. Although they might not meet your HPC needs now, they could in the future. Plus, tinkering with these small computers is lots of fun.