Moving HPC to the cloud
Scale Up
In the coming years, the cloud computing market is easily expected to exceed US$ 100 billion. Many organizations have found using the cloud has a clear advantage over in-house hardware. Indeed, shifting what was a capital expense to an operational expense has many advantages, including instant availability and the ability to scale up (and down) rapidly.
The advantage of pay-as-you-go computing has been an industry goal for many years. In HPC, the Globus project [1] has shown the power of grid-based computing and has spawned many successful production computation grids. Cloud computing takes grid computing to a whole new level by using virtualization to encapsulate an operating system (OS) instance. Users can construct their own operating system instance and run it in a cloud whenever they need computational resources. In addition to cloud computing, cloud storage also can be used independently or combined with OS instances.
Cloud computing would seem to be an HPC user's dream offering almost unlimited storage and instantly available and scalable computing resources, all at a reasonable metered cost. For all but the basic HPC applications, however, the use of a typical cloud needs a bit of due diligence because remote HPC services can range from shared HPC clusters to fully virtualized cloud environments.
Not All Clouds Look the Same
A "traditional" cloud offers features that are attractive to the general public. These services comprise single, loosely coupled instances (an instance of an OS running in a virtual environment) and storage systems backed by service level agreements (SLAs) that provide the end user guaranteed levels of service. These clouds offer the following features:
- Instant availability – Cloud offers almost instant availability of resources.
- Large capacity – Users can instantly scale the number of applications within the cloud.
- Software choice – Users can design instances to suit their needs from the OS up.
- Virtualized – Instances can be easily moved to and from similar clouds.
- Service level performance – Users are guaranteed a certain minimal level of performance.
Although these features serve much of the market, HPC users generally have a different set of requirements:
- Close to the "metal" – Many man-years have been invested in optimizing HPC libraries and applications to work closely with the hardware, thus requiring specific OS drivers and hardware support.
- Userspace communication – HPC user applications often need to bypass the OS kernel and communicate directly with remote user processes.
- Tuned hardware – HPC hardware is often selected on the basis of communication, memory, and processor speed for a given application set.
- Tuned storage – HPC storage is often designed for a specific application set and user base.
- Batch scheduling – All HPC systems use a batch scheduler to share limited resources.
Depending on the user's application domain, these two feature sets can make a big difference in performance. For example, applications that require a single node (or threaded applications) can work in a cloud. In this case, the user might have a single program that must be run with a wide range of input parameters (often called parametric processing), or they might have dataflow jobs, such as the Galaxy suite [2] used in biomedical research. These types of applications can benefit from most cloud computing resources.
Some applications can utilize highly parallel systems but do not require a high-performance interconnect or fast storage. One often cited example is digital rendering, in which many non-interacting jobs can be spawned across a large number of nodes with almost perfect scalability. These applications often work well with standard Ethernet and do not require a specialized interconnect for high performance.
Moving up the HPC tree, you'll find interconnect-sensitive applications that require low latency and high throughput interconnects not found in the traditional cloud. Indeed, most of these interconnects (e.g., InfiniBand and High-Performance Ethernet) require "userspace" communication pathways that do not involve the OS kernel. This method makes the use of cloud virtualization very difficult because most virtualization schemes cannot manage "kernel by-pass" applications (i.e., these are "on the wire" data transfers that are hard to virtualize). If high-performance networks are not available, many HPC applications run slowly and suffer from poor scalability (i.e., they see no performance gain when adding nodes).
Also in the tree are many I/O-sensitive applications that, without a very fast I/O subsystem, will run slowly because of storage bottlenecks. To open up these bottlenecks, most HPC systems employ parallel filesystems that drastically increase the I/O bandwidth of computing nodes.
Another growing branch includes performance accelerators or SIMD units (parallel computing Single-Instruction Multiple Data processors) from NVidia and AMD/ATI. This type of hardware is very specific to HPC systems and therefore is not found on typical cloud hardware.
At the top of the tree are applications that push on all levels of performance (compute, interconnect, and storage). These applications require fast computation (possible a SIMD unit), fast interconnects, and high-performance storage. Clearly, this computing environment is not found in a typical cloud and is unique to the HPC market. Attempting to run this level of application on a typical cloud will provide sub-par performance. A deeper treatment of these issues can be found in a recent article called "Will HPC Work in the Cloud?" [3].
Finally, any remote computation scheme needs to address the "moving big data problem." Many HPC applications require large amounts of data. Many clouds, even those that offer HPC features, cannot solve the problem easily. In particular, if the time to move large datasets to the cloud outweighs the computation time, then the cloud solution is now the slow solution. Interestingly, the fastest way to move data in these cases is with a hard disk and an overnight courier. (It seems the station wagon full of tapes is still the fastest way to transport data.)
The HPC Cloud Is Out There
With all the differences between the traditional cloud and HPC applications, users will be interested to know that HPC clouds and cloud-like resources are available. A number of companies, including Penguin, R-HPC, Amazon, Univa, SGI, Sabalcore, and Gompute offer specialized HPC clouds. Notably absent is IBM who, at this time, does not offer public HPC clouds. The company, however, does provide many options for constructing internal or private clouds.
As with many performance-based systems, the devil is often in the details and the word "cloud" (like the word "grid" in the past) can mean almost anything when it comes to performance. To provide a sense of the current HPC cloud capabilities, the previously mentioned companies were contacted and asked to answer some very specific HPC questions about their cloud offerings. The questions covered several categories including:
- System setup
- Interconnects
- Storage
- Workflow management
- Approximate cost
Note that the survey did not include academic clouds, such as those in the Future Grid project. These clouds are increasing, but access, cost, and capabilities can vary and might require research proposals. As with any new HPC project, due diligence will produce the best results. The following survey of HPC cloud vendors will help get you started.
Penguin Computing
One the first vendors to introduce a true HPC cloud was Penguin Computing. The Penguin On Demand [4], or POD cloud, was one of the first remote HPC services. From the beginning, POD has been a bare-metal compute model similar to an in-house cluster. Each user is given a virtualized login node that does not play a role in code execution. The standard compute node has a range of options, including dual four-core Xeon, dual six-core Xeon, or quad 12-core AMD processors ranging in speed from 2.2 to 2.9GHz with 24 to 128GB of RAM per server and up to 1TB of scratch local storage per node.
Getting applications running POD HPC clouds can be quite simple, because Penguin has more than 150 commercial and open source applications installed and ready to run on the system. Installing other applications is straightforward and available to users. Nodes with two NVidia Tesla C2075 computing processors are available.
In terms of network, POD nodes are connected via nonvirtualized, low-latency 10Gb Ethernet (GigE) or QDR IB networks. The network topology is local to ensure maximum bandwidth and minimum latency between nodes. Storage systems are made available via 10GigE to the local compute cluster. Additionally, POD has redundant high-speed Internet with remote connectivity ranging from 50Mbps to 1Gbps.
Several storage options are also available, starting with high-speed NFS using 10GigE attached storage. List pricing for storage on POD is US$ .10/GB per month. Beyond NFS, there are parallel filesystem options attached via multiple 10GigE links and InfiniBand. Lustre and Panasas high-performance storage systems also can be provided. Finally, dedicated storage servers are available. These systems can isolate data and facilitate encryption/decryption of high volumes of data by using physical shipping rather than Internet transfer.
POD offers a suite of tools to help manage your computation. Aptly called PODTools, Penguin offers a collection of command-line utilities for interacting with their HPC cloud. Beyond the standard SSH login, PODTools provide the ability to submit jobs, transfer data, and generate reports. Additionally, Penguin POD can be seamlessly integrated into existing on-site clusters to outsource excess workloads – often known as "cloud bursting." All these capabilities are encrypted and offer a high level of security.
In terms of cost, Penguin has an online pricing estimator [5]. I proposed two usage cases. The first was for a small case consisting of 80 cores with 4GB of RAM per core with basic storage of 500GB. POD pricing is based on cores/hour and would work out to be US$ 6,098.00/month or US$ .101/core per hour. A large example of 256 cores with 4GB of RAM per core and 1TB of parallel storage would cost US$ 18,245.00/month with the same US$ .101/core per hour.
Perhaps Penguin's biggest asset is a long history of delivering on-site HPC solutions. This experience has allowed them to develop a staff of industry domain experts. They also have long list of additional services that supplement their POD offering. These include on-premises provision of cloud bursting to the POD cloud, remote management of on-premises HPC services, cloud migration services, private remote HPC as service environments, and private internal clouds.
R-HPC
R-HPC [6] offers R-Cloud, wherein clients can "rent" HPC resources. R-Cloud offers two distinct computing environments. The first is a Shared Cluster, which offers a login to shared nodes and a work queue. This environment offers a classic cluster environment and is essentially a "shared cluster in the sky." Users are billed by the job, creating a pay-as-you go HPC service. No support or administration services are provided. The second environment comprises virtual private clusters that are carved out of a shared configuration. Use can be on-demand with VLAN access. These systems are billed on a 24/7 basis.
R-HPC can provide new 3.4GHz quad core Sandy Bridge-based systems with 16GB of RAM/node (4GB/core), DDR 2:1 blocking InfiniBand, and 1TB of local disk. Additionally, they have dual-socket 2.6GHz eight-core Sandy Bridge with 128GB of RAM/node (8GB/core), QDR non-blocking InfiniBand, 1TB of local storage, and 1TB global storage. These offerings are rounded out by Magny-Cours, Nehalem, and Harpertown systems. GPU-based systems in beta test are provided for dedicated users.
Most applications can be set up and running within one day (although R-HPC notes that licensing issues can delay the process for some users). Similar to Penguin's products, all the interconnects, which include DDR, QDR, FDR, and GigE, are run on the wire with no OS virtualization layer. Storage options include 10 GigE attached NFS/SMB with Lustre over IB as a possible upgrade. If ultimate storage performance is needed, R-HPC also offers the Kove RAM disk storage array [7]. All dedicated systems have block storage for security, whereas the shared clusters use shared NFS (no private mounts).
R-HPC will make service level agreements on a case-by-case basis depending on the customers needs. In terms of workflow management, Torque/OpenPBS is the most common scheduler; however, Maui and Grid Engine (and derivatives) can be provided as needed. Interestingly, cloud bursting, although possible with R-HPC systems, is almost never requested by customers. Another interesting aspect of R-HPC offerings includes Windows HPC environments.
In terms of pricing, R-HPC pointed to several factors that make it difficult to offer a standard price for their HPC services. Generally, the length of commitment can factor heavily into the price. The basic "no commitment" starting price is competitive with other cloud providers offering shared systems. Pricing does include NFS over 10GigE. Dedicated systems can be substantially less (i.e., volume reduces cost). Dedicated storage nodes for parallel filesystems are generally twice the hourly rate of dedicated system nodes. R-HPC offers performance tuning and remote administration services as well. They have extensive experience in HPC and can provide "tuned" application-specific private clusters for clients.
Amazon EC2 HPC
Perhaps the most well-known cloud provider is Amazon. Inquiries to Amazon were not returned, so information was gleaned from their web page. Originally, the EC2 service was found not suitable for many HPC applications. Amazon has since created dedicated "cluster instances" that offer better performance to HPC users. Several virtualized HPC instances are available on the basis of users' needs. Their first offering is two Cluster Compute instances that provide a very large amount of CPU coupled with increased network performance (10GigE). Instances come in two sizes, a Nehalem-based "Quadruple Extra Large Instance" (eight cores/node, 23GB of RAM, 1.7TB of local storage) and a Sandy Bridge-based "Eight Extra Large Instance" (16 cores/node, 60.5GB of RAM, 3.4TB of local storage).
Additionally, Amazon offers two other specialized instances. The first is a Cluster GPU instance that provides two NVidia Tesla Fermi M2050 GPUs with proportionally high CPU and 10GigE network performance. The second is a high-I/O instance that provides two SSD-based volumes, each with 1024GB of storage.
Pricing can vary depending upon on-demand, scheduled, or spot purchase of resources. Generally, the cost for on-demand EC2 instances are as follows: Quadruple Extra Large Instance is US$ 1.3/hour (US$ 0.33/core per hour), Eight Extra Large Instance is US$ 2.4/hour (US$ 0.15/core per hour), Cluster GPU instance is US$ 2.1/hour, and the High I/O instance is US$ 3.1/hour.
Thus, using the small usage case (80 cores, 4GB of RAM per core, and basic storage of 500GB) would cost US$ 24.00/hour (10 Eight Extra Large Instances). The larger usage case (256 cores, 4GB of RAM per core, and 1TB of fast global storage) would cost US$ 38.4/hour (16 Eight Extra Large Instances).
Amazon does not charge for data transferred into EC2 but has a varying rate schedule for transfer out of the cloud; additionally, there are EC2 storage costs. Therefore, the total cost depends on compute time, total data storage, and transfer. Once created, the instances must be provisioned and configured to work as a cluster by the user.
Besides the Amazon cluster instance, users may select from one of several preconfigured clusters [8]. Several studies also compare Amazon Cluster instances to real clusters [9].
Univa
Univa [10] is known as the place Sun Grid Engine landed after the Oracle purchase. Not only does Univa continue to develop and support Grid Engine, they also offer "one-click" HPC computing with their partner RightScale [11]. The companies offer two distinct ways to build clouds. The first is an internal cloud infrastructure system that provides "raw" virtual machines using one of several methods, including VMware, KVM, Xen, or something more sophisticated like OpenStack or VMware vSphere.
Their UniCloud product uses an Infrastructure-as-a-Service (IaaS) layer to create the machines and then directs the IaaS to provision a raw machine on the fly. Once the machine is provisioned, UniCloud takes over as a Platform-as-a-Service (PaaS) layer and does additional software provisioning and configuration to turn the raw instances into nodes that can then be part of an HPC cluster.
The second offering is an external cloud infrastructure system, such as Amazon EC2 and AWS, and is used to provide the "raw" virtual machines for HPC use. Once those machines are provisioned by EC2, additional provisioning and configuration management is done by UniCloud (PaaS), which turns those raw instances into nodes that can operate as an HPC cluster. This process is automatic, and adding and removing nodes from this HPC cluster environment can be done on demand.
In addition to building HPC clouds, UniCloud can use a rules engine to dynamically change the HPC clouds – changing node roles and adding/removing nodes from the HPC cloud on the fly. This feature allows cloud administrators to tailor the cloud for specific users' needs. Virtualization is always used in Univa HPC clouds and can adapt to support any underlying type of IaaS layer.
Univa notes that launching an external cloud on Amazon EC2 is incredibly easy; you can just go to the Amazon Market Place [12], choose HPC from the list, and then launch your cluster. A similar feature is available from RightScale, where Univa Server template macros are available.
Once the cloud is built, which can take anywhere from 5 to 20 minutes (depending on how large you make your cluster), you can then install your application. Application installation works just like a regular local machine and requires a similar amount of time. Once your application is installed, Univa recommends taking a snapshot of the filesystem volume where the application is installed; otherwise, you might end up installing it again when you restart your cluster.
Univa offers CPU cloud virtualization right now and does not presently offer GPU-based clouds, although they can support it with UniCloud on customer request. In terms of interconnect, Univa offers the standard gigabit Ethernet (10GigE) that is available in many cloud deployments. Interconnects are virtualized, although they have shown some interest in using pass-through to improve performance. Storage options include NFS, and some customers have used other filesystems, such as Gluster.
As expected, Univa offers Grid Engine as part of their HPC cloud. Their UniCloud package provides prebuilt "kits" that include Univa Grid Engine automatically. Univa Grid Engine can easily request more cloud nodes from UniCloud; alternatively, UniCloud can monitor Univa Grid Engine and add/remove nodes as needed. UniCloud can burst from an internal cloud to an external public cloud or from one public cloud to another public cloud.
In terms of cost, Univa costs are additive to the AWS instance charges and range from US$ 0.02 up to a max of US$ 0.08/core per hour. Remember that these costs provide a "ready-to-run" HPC cloud, whereas buying AWS instances requires the user to configure the cluster. Univa proposed a Medium AWS instance for the small usage case (80 cores, 4GB of RAM per core, basic storage of 500GB) that would cost US$ 14.40/hour (AWS @ US$ 0.16/hour + Univa @ US$ 0.02/hour)x80. The larger usage case (256 cores, 4GB of RAM per core, 1TB of fast storage) would cost US$ 58.88/hour.
Univa also offers remote or on-site consulting that is related to the deployment, support, training, and management of Univa Grid Engine, UniCloud, and UniSight products.
Sabalcore Computing
Sabalcore [13] offers HPC systems over the Internet for both on-demand and dedicated solutions. On-demand access is accomplished using an SSH client or secure remote desktop interface from any Windows or Linux laptop, desktop, or workstation. Users are provided with an NFS-exported home directory with persistent storage for results, code, applications, and data.
As part of their service, Sabalcore offers accounting tools that allow administrators to assign and track usage by users, projects, or resources. One interesting feature offered by Sabalcore is called Dynamic Cluster Isolation (DCI), which builds an optional firewall around the compute nodes of a particular job. DCI logically isolates the compute nodes of a running job that, according to Sabalcore, should satisfy Federal Government International Traffic in Arms Regulations (ITAR) requirements.
In terms of hardware, the operating system is preinstalled on the metal with a fixed amount of RAM per core, which is variable depending on the system. To facilitate fast startup, Sabalcore has dozens of popular open source applications that are ready to run. They will also install customer software and allow the customer to build and install their own application software.
Available hardware consists of Intel Xeon and AMD Opteron CPUs ranging from eight-core to 32-core nodes, running from 2.33 to 3.0GHz, with 32 to 128GB of RAM per node. They also have a handful of NVidia-based GPUs for testing but are not ready for production work. These resources are connected with both an Ethernet and InfiniBand non-virtualized network, which means they are run directly on the wire. The primary filesystem is based on NFS with parallel filesystems available for specific applications.
The primary scheduler is Torque, and the company also offers a custom scheduler. Currently, they do not offer cloud bursting, but could implement it if needed by customers. In terms of cost, Sabalcore stated that price is volume dependent and ranges from US$ 0.16 to US$ 0.29/core per hour and includes RAM, storage, support, and bandwidth.
Additionally, Sabalcore pointed out that their service includes comprehensive technical support (i.e., installation and tuning of application, workflow optimization, scripting, general consulting, debugging, troubleshooting, etc.) and is available by phone and email. Optional 24/7 phone support is available. They also can provide private clusters, including Windows HPC systems, contracts, custom SLAs, and ITAR-compliant networks.
SGI
SGI's cloud computing service is called Cyclone [14] and is specifically dedicated for technical applications. Through Cyclone, SGI offers both a software stack and HPC hardware in a nonvirtualized environment together with HPC applications. Pulling from both open source and commercial vendors, they have applications in Computational Biology, Computational Chemistry and Materials, Computational Fluid Dynamics, Finite Element Analysis, and Computational Electromagnetics.
Somewhat unique to Cyclone is the ability to use different computation environments, such as scale-up or scale-out nodes, choice of operating systems (SUSE or RHEL), and two interconnects (SGI NUMAlink or InfiniBand) that can be configured into multiple topologies (e.g., hypercube, all-to-all, fat tree, single-rail, or dual-rail).
The SGI Cyclone service offers either 3 or 2GB of memory per core. Using the SGI UV shared memory systems with NUMAlink interconnect, large memory systems can range between 500GB and 2TB of shared memory. Setup time can vary by application, and if the user wants to run one of the pre-installed applications, experienced customers can be up and running on Cyclone within a few hours. SGI notes that new customers can access a web portal GUI interface that provides assistance with running batch jobs on a cluster as well command-line SSH access. For some customers, SGI engineering staff will run their jobs as a service.
Within the cloud, SGI offers access to SGI ICE and SGI Rackable clusters running Intel Xeon processors as well as SGI UV shared memory systems, also running Intel Xeon processors. Interestingly, SGI notes that they have not seen any demand for GPU accelerators but can offer access to NVidia Tesla GPUs on request. Interconnects are either SGI NUMAlink or InfiniBand and are not virtualized. Customers can request a free trial access with which to test performance, as well as upload and download speeds.
In terms of storage, high-performance SGI Infinite Storage systems are available for scratch space and long-time archiving of customer data. No pricing information was given for storage. SGI provides a web-based user portal to run LS-DYNA and OpenFOAM that connects with Altair's PBS Pro job scheduler. For other technical applications, SGI offers PBS Pro as the batch scheduler. Currently, SGI is exploring the option to allow SGI customers to cloud burst onto Cyclone. SGI did not provide any cost information for remote computation. In addition to compute cycles, SGI offers: operating systems and/or application configuration, application tuning, application running, and remote data visualization.
Gompute
The final company in this comparison is Gompute [15]. However, they were not able to respond to my specific questions. The following information was found on their web page. Gompute provides CPU hours, storage, system administration, and support for HPC applications, including both open source and commercial. For commercial applications, users may either start a private license server at Gompute using the license key they get from an authorized reseller or allow the connection to their local license server. Gompute uses a pay-per-use model, whereby users get real-time information about their consumption for each job. This feature allows users to minimize their fixed costs, because the cost is proportional to the use of CPUs, storage, licenses, system support, and so on.
Gompute provides HPC services from computing farms located in Sweden and the United States. Both systems offer the same environment and are based on the Linux operating system. All clusters are based on x86-64 CPU architecture and InfiniBand interconnect. Storage alternatives include InfiniBand-enabled parallel filesystems. Shared and dedicated communication lines are available up to 1Gbps. The transport of data is also possible with the use of hardware-encrypted USB disks and a delivery service.
Resource usage depends on users' needs. The several alternatives include running jobs in a ready-to-use batch queue, buying a package of hours to be consumed within a predetermined time interval, reserving a fixed amount of CPU cores for a fixed period of time, or running on a pure batch queue. Users access Gompute resources through an encrypted connection and identify themselves using digital certificates. Users have access only to their data. SSH and SSL protocols are used to access Gompute.
Future Cloud Formations
Given the varied requirements for HPC clouds, it is understandable that the range of options can vary greatly. Solutions range from shared remote clusters to full virtualized systems in the cloud. Each method brings its own feature set that must be matched to the users' needs.
Finally, the above items are not intended to be an exhaustive list of HPC cloud providers. Others exist, and given that the market is new and growing, more vendors will be coming online in the near future. Many other factors should also be considered besides the brief analysis offered here. Your best results will come from doing due diligence and testing your assumptions. Perhaps the most important aspect of cloud HPC is the ability to work with your vendor, because having a good working safety net under your cloud might be the best strategy of all.