Exploring Apache CloudStack
Stack Check
Apache CloudStack [1] is an open source software platform that pools computing resources to build public, private, and hybrid Infrastructure-as-a-Service (IaaS) clouds. The CloudStack toolset manages the network, storage, and compute nodes that make up a Cloud infrastructure.
CloudStack started life as VMOps, a company founded in 2008 with product development spearheaded by Sheng Liang, who developed the Java Virtual Machine at Sun. Although early versions were very much focused on the Xen Hypervisor, the team soon realized the benefits of multi-hypervisor support. In early 2010, the company achieved a massive marketing win when it acquired the domain name http://cloud.com and formally launched CloudStack, which was 98 percent open source. In July 2011, CloudStack was acquired by Citrix Systems [2], who released the remaining code as open source under GPLv3.
The big news came in April 2012, when Citrix donated CloudStack to the Apache Software Foundation, where is was accepted into the Apache Incubator. Apache CloudStack has now been promoted to a top-level project of the Apache Software Foundation, a measure of the maturity of the code and its community. Citrix continues to market the cloud management solution CloudPlatform [3], which is built on CloudStack.
Multiple Hypervisor Support
CloudStack works with a variety of hypervisors, and a single cloud deployment can contain multiple hypervisor implementations. The current release of CloudStack supports pre-packaged enterprise solutions like Citrix XenServer and VMware vSphere, as well as OVM and KVM or Xen running on Ubuntu or CentOS. Support for Hyper-V is currently in development and should be available in a future release.
CloudStack can manage tens of thousands of host servers installed in multiple geographically distributed data centers. The centralized management server scales linearly, eliminating the need for intermediate cluster-level management servers. No single component failure can cause a cloud-wide outage. Admins can perform periodic maintenance on the management server without affecting the functioning of virtual machines running in the cloud.
CloudStack offers an administrator's web interface for provisioning and managing cloud resources, as well as an end-user web interface for running VMs (Figure 1). CloudStack automatically configures each guest virtual machine's networking and storage settings and internally manages a pool of virtual appliances to support the cloud itself. These appliances offer services such as firewalling, routing, DHCP, VPN, console access, storage access, and storage replication. The extensive use of virtual appliances simplifies the installation, configuration, and ongoing management of a CloudStack deployment.
API and Extensibility
The CloudStack cloud environment provides an API that gives programmatic access to all the management features available in the UI. This API enables the creation of command-line tools and new user interfaces to suit particular needs. The CloudStack pluggable allocation architecture allows the creation of new types of allocators for the selection of storage and hosts.
CloudStack can translate Amazon Web Services (AWS) EC2 and S3 API calls to native CloudStack API calls so that users can continue using existing AWS-compatible tools.
CloudMonkey [4] is a Command-Line Interface (CLI) for CloudStack written in Python. CloudMonkey lets you easily create scripts to automate complex or repetitive admin and management tasks, from simply adding multiple users to deploying a complete CloudStack architecture.
Access to the API, either directly or by using CloudMonkey, is protected by a combination of API and secret keys and a signature hash. Users can re-generate new random API and secret keys (as well as their UI password) at any time, providing maximum security and peace of mind.
CloudStack Deployment Architecture
The CloudStack server itself may be deployed in a multi-node installation, where the servers are load balanced across data centers (Figure 2). MySQL may be configured to use replication for failover in the event of database loss. For the hosts, CloudStack supports NIC bonding and the use of separate networks for storage, as well as iSCSI Multipath.
CloudStack infrastructure has six key building blocks. Regions are very similar to an AWS Region and are the first and largest unit of scale for a CloudStack cloud. A Region consists of multiple availability Zones, which are the second largest unit of scale. Typically there is one zone per data center. A zone contains PODs, clusters, hosts, and storage.
One cloud can contain multiple regions, and even if one region should go offline, VMs in other regions are still accessible because each region has dedicated management servers located in one or more of its zones.
PODs, the third unit of scale, are often a single rack, which houses networking, compute, and storage. PODs also have logical, as well as physical, properties. Components such as IP addressing and VM allocations are influenced by the PODs within a zone.
Clusters are the fourth unit of scale and are simply groups of homogeneous compute hardware combined with primary storage. Each cluster will run a common hypervisor, but a zone can consist of combinations of all of the supported hypervisors.
Hosts are the fifth unit of scale and provide the actual compute layer on which virtual machines will run.
Storage is the final building block. Two key types of storage exist within CloudStack: primary and secondary. Primary storage is where virtual machines reside, which could be local storage within a compute host or shared-file/block storage using NFS, iSCSI, or Fibre Channel.
Secondary storage is where virtual machine templates, ISO images, and snapshots reside and is currently always presented over NFS. You can also use OpenStack's Swift component to replicate secondary storage between zones, ensuring users always have access to their snapshots even if a zone is offline.
A lot of development work is currently underway with storage, and some great new features are coming in the next release of CloudStack, thanks to a new storage subsystem.
Networking
The glue that brings all of the building blocks together is the network layer. CloudStack has two principal models for Networking, referred to as Basic and Advanced.
Basic networking is very similar to the model used by AWS, and it can be deployed in three slightly different ways, with each adding to the features of the previous. Security Groups, which use Layer 3 IP address filtering, isolate VMs from one another. A Citrix NetScaler provides public IP and load balancing functionality.
You can scale the zone horizontally by simply adding more PODs, consisting of a clusters of hosts and their associated top-of-rack switching and primary storage.
The Advanced networking model brings a raft of features that place power into the hands of the end users. VLANs are the standard method of isolation but Software-Defined Networking (SDN) offerings from Nicira, Big Switch, and soon Midokura bring the possibility of massive scale by overcoming any VLAN limitations.
CloudStack makes excellent use of system virtual machines to provide control and automation of storage and networking. One such system VM is the CloudStack virtual router. The key difference between Advanced and Basic networking is that, in the Advanced mode, users can create CloudStack guest networks, with each network having a dedicated virtual router. A virtual router provides DNS and DHCP, firewall, client IPsec VPN, load balancing, source/static NAT, and port forwarding, and all of these features are configurable by end users through either the GUI (Figure 3) or the CloudStack API.
When a user creates a new guest network, then deploys Guest VMs onto that network, the VMs are attached to a dedicated L2 broadcast domain, isolated by VLANS, and fronted by a virtual router. The user has full control of all traffic entering and leaving the network, with a direct connection to the public Internet.
Firewall and port forwarding rules enable the mapping of live IPs to any number of Internal VMs. Load balancing functionality with round-robin, least connections, and source-based algorithms, along with source-based, App Cookie or LB cookie stickiness policies are available out of the box.
Another powerful feature of the Advanced network model is the Virtual Private Cloud (VPC). A VPC enables the user to create a multi-tiered network configuration, placing VMs within their own VLAN. ACLs let users control the flow of traffic between each network tier and also the Internet. A typical VPC might contain three network tiers (Web, App, and DB), with only the Web tier having Internet access.
VPCs also provide additional features, such as Site-2-Site VPN, enabling a persistent connection with infrastructure running in alternate locations. A VPC private gateway is a feature that cloud admins can leverage to provide a second gateway out of the VPC virtual router.
CloudStack optimizes the underlying network architecture within a data center by enabling the Cloud admins to split up the various types of network traffic and map them to different sets of bonded NICs within each compute host.
CloudStack supports four types of physical network, and you can configure them to use a single NIC or multiple bonds, depending on how many NICs are available in the host server. (See the box titled "Network Types.")
Network Service Providers
In addition to the virtual router and VPC virtual router, CloudStack can also leverage the power of real hardware, bringing even more functionality and greater scale. Currently supported devices are Citrix NetScaler, F5 Big-IP, and Juniper SRX, but with many more on the way.
Once a device has been integrated by Cloud Admins, the users have control of the features via the standard GUI or API. For example, if a Juniper SRX is deployed, when a user configures a firewall rule within the CloudStack UI, CloudStack uses the Juniper API to apply that configuration on the physical SRX.
When a Citrix NetScaler is deployed, in addition to load balancing, NAT, and port forwarding, it also enables AutoScaling. AutoScaling is a method for monitoring the performance of your existing Guest VMs and then automatically deploying new VMs as the load increases. After the load has dropped off, the extra VMs can be destroyed, bringing your usage and costs back down to a base level. This level of flexibility and scalability is a key driving force in the adoption of cloud computing.
Management
CloudStack is quite easy to set up and administer thanks to its great graphical user interface, API, and CLI tools such as CloudMonkey. A wizard takes you through the configuration and deployment of your first zone, networking, POD, cluster, host, and storage, meaning you can be up and running within a matter of hours.
A simple Role-Based Access Control (RBAC) system presents different levels of users with the features to which they are entitled, and the standard allocations can be fine tuned as required. The authentication can also be passed off to LDAP, enabling integration with enterprise systems, including OpenLDAP and MS Active Directory.
Admins set up new user accounts, which are grouped together into domains, allowing a hierarchical structure. By grouping users into domains, Admins can make certain subsets of the infrastructure available to a particular group of users.
A set of system parameters called global settings allows admins to control all of the features and set up controls such as limits, SMTP alerts, and a host of other settings.
Service offerings enable Admins to set up the parameters that control the end-user environment, such as the number of vCPUs, RAM, network bandwidth, and preferred hardware.
Admins have full control over the infrastructure and can initiate the live migration of any VM between hosts in the same cluster. You can migrate stopped VMs across different clusters by moving their associated volumes to different storage. Storage devices and hosts can be taken offline for maintenance and upgrades, and admins can steer VMs to a particular set of hosts using either the API or tags.
User Experience
Most CloudStack features available to end users are available via the GUI, with just a few of the more advanced, newer features accessible only through the API. CloudStack's easy-to-learn GUI lets new users get their first VMs up and running within a matter of minutes.
The process for creating a new VM is handled by a very intuitive graphical wizard, which steps you through the process in six easy steps. Choose a zone, select a pre-built ISO template, and choose a compute offering (a bundle of properties that defines the amount of CPU, RAM, network bandwidth, and storage tier). Then, add an additional data volume, configure the network, choose a hostname, and launch the VM.
Once users have their VMs up and running, they can start to explore the other features available to them (see the box titled "Additional Features."
Why Choose CloudStack?
CloudStack has a proven track record in both the enterprise and service provider space with some of the world's largest clouds. I have personally been involved in a large number of CloudStack implementations on three different continents, and while any large IT project will hit a few bumps along the road, all the implementations came in on time.
Unlike some open source cloud technologies, CloudStack is truly a single project, with a common set of objectives and goals, driven by a very active and passionate community. See the box titled "In Process" for a list of some new features currently in development.