Success in high-performance computing (HPC) is often difficult to measure. Ultimately, it depends on your goals and budget. Many casual practitioners assume that a good HPL (High Performance Linpack) benchmark and a few aisles of servers demonstrate a successful HPC installation. However, this notion could not be further from the truth, unless your goal is to build a multimillion dollar HPL machine. In reality, a successful and productive HPC effort requires serious planning and design before purchasing hardware. Indeed, the entire process requires integration skills that often extend far beyond those of a typical data center administrator.
In this article, I will outline several common pitfalls and how they might be avoided. One way to navigate the possible pitfalls is to consider a partnership with an experienced third-party HPC integrator or consultant. An experienced HPC integrator/consultant has a practical understanding of current technologies (i.e., what works and what does not), has the ability both to listen and execute, and, most importantly, has experience delivering production HPC (i.e., HPC that produces actionable results for something other than HPC research or education.) Besides covering the five most common pitfalls, I talk about other aspects of HPC stewardship, including costs, consultants, and relationship intangibles.
The current state of HPC is such that customers are now responsible for decisions and tasks previously handled by large supercomputer companies. Although you can still buy a turnkey system from a few large vendors, the majority of the market is now based on multisourced commodity hardware and openly available software. Transferring the responsibility for vendor selection and integration to the customer has significantly reduced the cost of HPC systems, but it has also introduced potential pitfalls that could result in extra or hidden costs, reduced productivity, and aggravation.
In today's market, purchasing an HPC cluster is similar to buying low-cost self-assembled furniture from several different companies. The pile of flat-pack boxes that arrives at your home is often a far cry from the professionally assembled models in the various showrooms. You have saved money, but you will be spending time deciphering instructions, hunting down tools, and undoing missteps before you can enjoy your new furniture. Indeed, should you not have the mechanical inclination to put bolt A16 into offset hole 22, your new furniture might never live up to your expectations. Additionally, integrating the furniture into an existing room could be a challenge. Forgetting that the new surround sound system requires a large number of cables to be routed throughout your new media center means it is time for some customization.
A typical HPC cluster procurement is a similar experience. Multiple components from different vendors can introduce hidden integration and support costs (and time) that are not part of the raw hardware purchase price. These additional costs (and time) are often due to the pitfalls described here and often come as a surprise to the customer.
Five HPC Pitfalls To Avoid
This list is by no means an exclusive collection of potential HPC pitfalls, nor is it intended to imply that all customers have similar experiences. Experience and ability are varied and great among the many commodity hardware and solution providers. In a post-single-vendor supercomputer market, the largest issue facing customers is how to manage multiple vendor relationships successfully. In addition to warnings about potential issues, the following scenarios should also help set a reasonable customer expectation level when working within the commodity HPC market.
Pitfall 1: Popular Benchmarks Tell All
Benchmarking is at the heart of the HPC purchasing process because price-to-performance metrics drive many sales decisions But the devil, as they say, is in the details when it comes to benchmarking. Many popular and well-understood benchmarks exist for HPC, but unless a popular benchmark is similar to or the same as the applications you intend to run on your cluster, their actual usefulness can vary.
Perhaps the best known benchmark is HPL (High Performance Linpack), because it is used to compile the bi-annual list of the fastest 500 computers in the world (known popularly as the "Top500" . These rankings are useful in many contexts but offer little practical guidance for most users. Unless a user plans on running applications that perform the same type of numerical algorithm used in the HPL benchmark (linear algebra), then using HPL as a final arbiter of price to performance could lead to disappointment. Additionally, International Data Corporation (IDC) has reported that 57% of all HPC applications and users surveyed use 32 processors (cores) or fewer. Most of the Top500 HPC runs involve thousands of cores, so using the Top500 results as a yardstick to measure applications requiring 16 or 32 cores makes little sense. Using other popular benchmarks presents the same problems. In reality, benchmarking your own applications is the best method to evaluate hardware. Indeed, if you run popular code, many vendors might already provide benchmark results that can give you a baseline.
Running the HPL benchmark on a newly delivered machine or knowing a vendor that can deliver machines that can run this benchmark is definitely an advantage. The HPL benchmark can be used to stress the entire system in such a way that it uncovers hidden issues. Additionally, the approximate HPL results for standard HPC hardware are well known and publicly available (i.e., Top500.org  is good place to look). So, if a newly installed cluster is not producing an HPL result in the ballpark of similar machines, it is a good indication that something is amiss.
Another benchmark measure to consider is that of GP-GPUs (general-purpose graphical processing units). The bulk of these devices are sold in the consumer market as high-end video cards, and the "general-purpose" part of the GPU makes them a very powerful parallel processing platform. The benchmark numbers are quoted in terms of speed-up (i.e. 25x speed-up). Unfortunately, no standard baseline exists against which to measure the speed-up, and these results are for single-precision performance (lower quality). Also, a re-programming cost is associated with these devices.
As with HPL, if a specific application speed-up is similar to your own requirements, benchmarking with a GP-GPU can be a big win, but be careful with assumptions. This market is rapidly changing and attention to detail will allow you to assess the hardware and software properly.
Other design issues include the type of interconnect (either InfiniBand or Ethernet) and the processor family. Choosing an interconnect should be determined by your performance needs (your application benchmarks) and budget and not by low-level benchmarks.
Other issues might determine the best interconnect for your cluster, such as local integration issues, performance, and cost. In terms of processors, both Intel and AMD support x86 software compatibility, but the processor and memory architecture of each is very different and could affect performance. Benchmarks are really the only way to determine which of these design features is best for your needs and budget.
Another important issue to consider when evaluating benchmarks is the overemphasis on price to performance. (Price-to-performance numbers are commonly reported as dollars per GFLOPS – giga-floating-point operations per second). When calculating this number, many practitioners use the raw hardware cost and the HPL benchmark result. In the context of current technology trends and costs, this is a valid number. However, in terms of real costs, this approach can be misleading. A better metric to consider is the total cost of ownership (TCO). This number implies a multiyear cost, not just a one-time acquisition cost. In contrast to price to performance, TCO is more difficult to calculate because it requires more data. Surprisingly, in many TCO estimates, over a three-year term, the TCO can exceed the initial cost of hardware when integration, infrastructure (power and cooling), maintenance, and personnel costs are included. These costs will be described in more detail in the "Understanding Costs."
Pitfall 2: All Commodity Hardware Is the Same
At some level, all commodity hardware is the same; otherwise, it would not be labeled "commodity." This assumption also entices customers to build clusters from specification sheets, seek low-ball bidders, and skip testing the actual hardware, all of which invites serious and costly problems. HPC pushes hardware more than any other industry segment. Subtle differences that do not matter in other industries can create issues in terms of Reliability, Availability, and Serviceability (RAS) in the HPC sector.
Because clusters are created with multisourced hardware components, the long-term RAS requirement can be difficult to estimate. Choosing the wrong component can result in poor performance, increased downtime, and, in the worst case, an unfixable problem. This situation can also result from buying technology that is "too new" and immature. In some cases, new motherboards have demonstrated stability issues, but the vendor did not address the problem on a particular revision, presumably because it did not affect a large population of customers. In other situations, case designs did not supply adequate cooling for 24/7 HPC use.
Buying all the hardware from the same vendor might avoid some of these issues because vendors usually perform interoperability tests, but a single hardware vendor might limit your choices in terms of available options. Additionally, no single vendor manufactures every component and, as such, cannot provide the vertical depth of support that the Original Equipment Manufacturers (OEMs) can provide. A good example of this is InfiniBand (IB). Cluster nodes might use an IB HCA (PCIe card), whereas others might have IB on the motherboard. Ultimately, deep support issues must be directed back to the InfiniBand OEM.
Support is yet another highly variable issue in the commodity realm. Most companies will support their own hardware, but when it comes to integration with other systems, companies find it difficult to offer any kind of support because they have no control over foreign hardware.
Standards help this process, and in theory, they allow interoperability, but they do not guarantee performance. For instance, a Network File System (NFS) server is available from many vendors in the form of a Network Area Storage (NAS) appliance. Each appliance will adhere to the NFS protocol, but poor performance can greatly limit the true potential of a cluster. A similar situation exists for network/interconnect components as well.
As is typical in many software/hardware situations, vendors often play the "blame game" when trying to support other vendors' hardware and software. Open software exacerbates the problem because the actual software stack typically varies from cluster to cluster. Thus, a vendor will often suggest other software and hardware as the root of a problem – and vice versa from other hardware and software vendors. The user is then stuck in the middle. Getting a clear delineation of vendor responsibilities is important. If you suspect a problem, try and reproduce it outside of the cluster or, at a minimum, collect clear data that points to the problem.
Pitfall 3: Free Software Has No Cost
Another misconception that extends far beyond HPC clusters is the notion that openly available software is free and therefore adds no cost to a cluster. Although the initial acquisition cost of open software might be nonexistent, software support and integration most certainly have associated costs. This time and effort has to come from either the user or a vendor and does not vanish because the software was freely available. In the case of HPC clusters, these costs can be quite substantial and are often the responsibility of the customer. If the customer takes the "learn as you go" approach to managing an open software stack, additional time and cost should definitely be expected.
It is possible to purchase a complete Linux-based cluster distribution or download one of several freely available options. In general, the software is very similar and often based on a commercially available Linux distribution (i.e., RHEL from Red Hat, SELS from Novell, or Red Hat rebuilds like CentOS), but the support options can vary from vendor to vendor. A small number of hardware vendors will support an entire software stack on their hardware – that is, they have the internal expertise to support a deep issue as it pertains to their hardware – but most vendors merely sell third-party cluster distributions or leave the choice to the user.
A professionally supported cluster software distribution has a definite advantage. Having an expert manage software upgrades, security updates, and bug reports is important to production-level sites, but as with vendor-supplied hardware, there are limits to what a vendor will provide. If a user requires a new version of a package (or a package that did not exist in the original distribution), he will be left to install and support these packages on his own. This situation is similar to those who prefer to "roll their own" cluster software for use on top of standard Linux distributions.
One of the biggest issues facing cluster administrators is upgrading software. Commonly, cluster users simply load a standard Linux release on each node and add some message-passing middleware (i.e., MPI) and a batch scheduler. This arrangement offers a quick victory for the administrator, but could cause serious upgrade issues and downtime in the future. For instance, upgrading to a new distribution of Linux might require rebuilding MPI libraries and other middleware. User applications might also need to be rebuilt with a third-party optimizing compiler that does not yet support the new distribution upgrade.
Administrators and users are then required to find workarounds or fixes that allow users to run the new software. Other packages can suffer a similar fate, resulting in frustration and lost productivity. In summary, free software does not imply free support or easy integration. The open nature of Linux-based software does allow optimal flexibility and choice within a local user environment, but it can also places extra responsibility on the administrator or user.
Pitfall 4: Integration Is Racking and Stacking
Installing cluster hardware is an important job, often requiring someone with the experience and skills to integrate the hardware into a complete system. This process includes component placement, network wiring, and testing. Most large clusters are built on-site and, as such, often result in unforeseen issues that can mean delays or even additional expense. Preferably, your vendor will stage the cluster, perform acceptance testing, and then install it at your site.
Although pre-staged clusters are usually more expensive than site-built systems, you are assured that the cluster will be available for use within a day or two of delivery. When the cluster is extremely large, it might not be possible to pre-stage the hardware because of space or power constraints at the vendor facility. These systems require a professional installation team as well.
The difference between connecting hardware and delivering a workable cluster is notable. Customers should always have an acceptance testing plan under which their actual applications must be demonstrated to run optimally on a cluster. An acceptance testing plan is a far cry from installing a Linux distribution on each node and testing network connectivity. A vendor's ability to deliver a usable cluster is called the "stand-up rate" and is, in effect, how long it takes to provide a fully functioning cluster from the day of delivery. A good stand-up rate should be measured in days, not weeks.
One shaky assumption users often make is that, once the cluster is operational, the integration is finished. In some respects, the installation has been successful, but in almost all cluster deployments, a several-month period usually ensues during which local integration and system "shake-out" takes place.
Local integration can vary by site, but it often involves user rights management, storage issues, resource schedule policies, system tuning, user/administrator education, and basic administration issues. This task is perhaps one of the most difficult aspects of cluster acquisition and often the most overlooked. Indeed, it is when support from the vendor can be most critical. In many cases, however, the large tier-one vendors are not organized to provide the "high-touch" level of support required at this juncture, and many are glad to part company once the hardware is installed. Smaller vendors and integrators have a distinct advantage because they typically provide direct and knowledgeable support during this process (i.e., you can talk to the guy who actually built or configured your system).
Pitfall 5: NFS is Enough
Storage is often the forgotten aspect of initial HPC designs. During the specification stage, customers often assume that storage is simply the number of hard disks to be placed in one of the administrative nodes. The successful use of NFS as a cluster-wide filesystem invites the assumption that all storage needs can be addressed in this fashion. In reality, the amount and type of cluster-accessible storage you need depends largely on application requirements rather than on the total size of a storage system.
NFS works fine for many clusters, but issues tend to develop in clusters with more than 100 nodes, and, in fact, many people are surprised to learn that NFS was not originally designed for a cluster environment. The upcoming parallel NFS (pNFS) is intended to help solve this problem, but most clusters just run NFS "out of the box" without any optimization. Additionally, the multicore nature of cluster nodes usually requires more I/O traffic to individual nodes. Poor file I/O performance can lead to poor utilization of your compute nodes and diminish the expected performance of your cluster.
Parallel filesystems might be needed in the event that you have a large and fast I/O requirement. Some of the more common parallel file solutions include Lustre, Panassas, GlusterFS, and IBM GPFS. The choice of filesystem should be based solely on your needs because no one parallel filesystem solution fits all. As mentioned previously, benchmarking might be the only way to make this determination. New technology developments in storage networks like Fibre Channel over Ethernet (FCoE), Fibre Channel over InfiniBand. (FCoIB), or NFS/RDMA (NFS using Remote Direct Memory Access) might need to be considered as part of a filesystem solution. Evaluating these options should be done carefully because cost and performance can vary widely.
A common theme running through all of these pitfalls is the need to understand fully the additional (or hidden) costs associated with clusters. Failure to account for these costs will result in lost productivity, additional expenses, and, in the worst case, a non-functioning cluster.
In general, the pitfalls mentioned in this article can be placed in the following categories:
- validation/optimization/specification costs,
- software integration/maintenance/upgrade costs, and
- infrastructure costs.
This list is not meant to be all-inclusive; other costs could certainly arise, and a careful review of your requirements should help focus your plans.
One way to help minimize these unexpected costs is to prepare a detailed specification and Request For Proposal (RFP) to be used when contacting vendors.
A detailed article on how to write a technical cluster RFP can be found online at ClusterMonkey.net . It is imperative that you include some form of acceptance testing for the cluster to ensure that it is working properly. Qualified vendors should be able to answer pointed HPC questions. If they cannot, a pitfall could lie ahead.
Consider Enrolling an Ally
The realities of cluster acquisition can be quite sobering. As mentioned here, customers are now required to make many decisions that did not previously exist when purchasing a fully integrated supercomputer from a single vendor. The multivendor nature of commodity clusters has provided the double-edged sword of choice and responsibility for most HPC users.
One approach that has proven very successful for many organizations has been to employ a third-party consultant or integrator to assist with the specification, acquisition, and integration of the cluster.
At first glance, some might consider this an additional and unnecessary expense, but in light of the possible pitfalls I've mentioned, enlisting an experienced ally can actually save money in the long run. An integrator or consultant can lower your overall project cost because they will help you purchase only what you need and will make sure it functions properly. Indeed, savvy customers sometimes choose to keep a knowledgeable integrator/consultant under an ongoing support contract should future issues arise with the cluster. Even if you have existing technology personnel, the use of a consulting contractor or integrator during the acquisition process can help keep you on schedule through the extra work that will be required for a successful project, and it is important to include the integrator/consultant at the very beginning of the process, before you start talking to hardware vendors. Some integrators also sell hardware, but it is best to find a "hardware-neutral" partner that can recommend what is best for your requirements.
The value of an integrator/consultant is the knowledge and relationships that they already have within the HPC market. Often a good integrator can tell you what works well and what doesn't before you commit to any specific hardware – or, which vendors fully understand and support the HPC market and which are recent additions to the market and are still learning the ropes themselves. In general, a good integrator/consultant can help you navigate the potential pitfalls within the cluster acquisition and integration process.
If you choose to bring an integrator/contractor in as a partner for your project, be sure to check out the relationship intangibles. These are the aspects of your partnership that do not appear on the statement of work or contract. For example, some questions to consider are:
- How well does the integrator/consultant listen?
- Are phone and email messages returned promptly?
- How knowledgeable are the integrator/consultant team members?
- How well does the integrator's team work with your team?
These and other issues in your relationship with the integrator/consultant can be every bit as important as the work they will perform.
Good and honest communication is essential. Problems will inevitably arise, and how they are handled is what separates a good partner from a bad partner. Perhaps the best way to evaluate the intangibles is to get references from other customers. Any experienced integrator/contractor will have no problem providing references from past and present customers.
Before you make a decision, you might even want to interview the members of the integrator/contractor team. These people are going to be the "boots on the ground" when it comes time to make things work.
Finally, you should trust your gut instincts. If you don't get a good feeling from one integrator/consultant, look for another. Although good HPC integrators are somewhat rare, they are available. Your styles should complement each other and support your HPC mission.
Specifying, procuring, and managing HPC resources can be a challenging task. As discussed, a few common pitfalls can cause major problems and incur extra costs. In particular, understanding the nuances of public benchmarks, commodity hardware, free and open software, integration, and storage will allow you to make better decisions.
Beyond the initial hardware purchase, you have many costs to consider, and understanding these will help create better expectations and minimize problems during both your purchase and installation phases. Using an HPC integrator/contractor can lower project costs and help you navigate the pitfalls as you travel the path to successful and productive HPC in your organization.