Virtualization Azure Site Recovery Lead image: Lead Image © Andrea Danti, 123RF.com

Business continuity management

Continuity Guaranteed

We take a look at the new business continuity service from Azure and show how to use it. By Michel Lüscher, Florian Frommherz

Azure Site Recovery is designed to help enterprises protect critical applications by coordinating the replication and restore process for physical or virtual computers. The service gives administrators the ability to use their own data center, a hosting service provider, or Azure as the replication location. This saves costs and overhead for setting up and managing a secondary site. Environments can be protected by policy-based replication of virtual machines.

Azure Site Recovery coordinates and manages ongoing replication of data by integrating existing technologies such as Hyper-V Replica, System Center, and SQL Server AlwaysOn. In this article, we show how the new business continuity service works and how to deploy it.

Creating good business continuity involves a fair amount of complexity; after all, the availability of the endpoints and maintaining productivity in case of failure are at stake. Business Continuity Management (BCM) can be simplified in various ways through the use of virtual machines. Whether this means creating and backing up different snapshots of a virtual machine or its operating system through the use of checkpoint technology or simply moving a virtual machine between virtualization hosts, all of these different functions add their own level of complexity. For example, how can you ensure that the virtual servers wake up in the right order on a different host in a different data center? Moreover, how will the underlying network configuration, IP addresses, and DNS cope with this?

Simplifications in the form of system monitoring and intelligence, in combination with automatic mechanisms, can be a relief. Since the release of Windows Server 2012 R2, Microsoft has offered a cloud service for BCM under the name of "Hyper-V Recovery Manager." The first version covered the enterprise-to-enterprise (E2E) scenario by orchestrating a failover of virtual machines from one Hyper-V host/cluster to another. The current version sees the team from Redmond add capabilities beyond cloud recovery or service provider-based recovery to this system with the ability to integrate third-party systems. This explains why the cloud service was renamed Azure Site Recovery (ASR).

Failover Scenarios with Azure Site Recovery

ASR supports multiple failover and recovery scenarios. It can initiate recovery from the primary, physical data center to Azure Infrastructure as a Service (IaaS); it also initiates the recovery to a secondary data center that you can either operate yourself or that resides with a hosting service provider (Figure 1). This means that failover between data centers (E2E) and between a data center to Microsoft Azure (E2A) is possible. If you decide to rely on a cloud storage provider who can also host virtual machines, you can also recover your virtual disks via a storage provider (E2SP). If you are only interested in SQL, you can replicate your databases to Azure – the failover then occurs there. SQL AlwaysOn is used for the replication.

Figure 1: Azure Site Recovery manages a failover from the cloud either to a secondary site or to Azure.

If you use Azure as the failover location for your virtual machines, the service creates the synchronized disks on inexpensive Azure blob storage and even offers geo-redundant storage on request. Optionally, the virtual machine disks can be encrypted for storage with a key defined by the administrator. Even if the failover never happens, you still have the certainty that your data is secure in the cloud thanks to a key of your choice. Virtual disks are kept up to date by cyclical updates.

Hyper-V Replica is used as the underlying technology here; it is included in the hypervisor feature scope of Windows server and also supports operations between hosts within your own data center. The synchronization intervals are selectable between 30 seconds, 5 minutes, or 15 minutes and provide the remote replica with the required delta updates. Thus, administrators can select the synchronization period and define the potential loss in a worst-case scenario for each virtual machine individually, and they can decide how quickly changes are replicated. As the intervals become shorter, the probability that you will need to replicate more data grows – and this can be a question of bandwidth.

These features give admins several options: ASR does not actually understand the terms "primary" or "secondary" data center, which means that you can also create failover scenarios in the form of one or more hub-and-spoke setups, and you can more or less determine the failover targets on an individual virtual machine basis in the worst case. Smaller data centers can thus manage their workloads themselves and use a larger hub location somewhere in the region as a failover location if something goes wrong. This can be an Azure IaaS region or another, larger data center.

Azure Site Recovery Function

ASR is mainly controlled from in the cloud. You handle all the failover and service configuration functions in the Azure management portal. The portal uses agents to communicate with the individual virtual machines and hypervisors. SSL certificates are used for identification. All communication is routed through HTTPS port 443, which takes care of various firewall problems.

Data protection officers and security officers should note that physical data is only stored on Azure in the case of an E2A scenario – that is, replication of virtual machines to Azure. In this case, the virtual hard disks are replicated to Azure and brought to life in case of failure. In all other scenarios, metadata handles failover control. Virtual machines or payload data from the virtual machines do not need to reside on Azure.

The installed agents regularly communicate the health and configuration status to the cloud-based service. This communication between the agents and Azure allows instructions for changing the synchronization configuration or the failover command to be transferred. In this way, Azure Site Recovery becomes a command center for recovery: The hypervisors, including the virtual machines, listen for commands from the cloud and can thus be restored at the push of a button at a different location through the Azure portal. This helps you create more than just a centralized management site for recovery or go through recovery steps for test purposes; in fact, you can offer self-service for recovery, which is great in the enterprise: If several departments are accustomed to managing their own workloads at the data center, you can also assign them the right to restore independent of other services.

In most cases, the agent is only installed on the System Center Virtual Machine Manager (SCVMM) management server along with the required certificate. For branch offices, the best approach is to install the agent directly on the Hyper-V host. There is no need to configure anything on the virtual machine – at guest operating system level, that is.

Clear-Cut Initial Configuration

Although protecting your first VMs with ASR obviously involves a modicum of work, the configuration is amazingly simple and clear cut. Your task list will include enabling a vault, creating a certificate for secure communication, installing the agent, and then creating a virtual network in Azure.

To do this, you need to change to the Recovery Services in the Azure management portal navigation. If you have not already created a vault, this is your first step. In the wizard, assign a name for the Site Recovery Vault and the desired target region – Azure creates a vault within a few seconds.

The vault is now available below the Recovery Services menu item and can be selected for configuration there. This is where you choose the scenario, as described previously:

Between an on-premises VMM site and Azure
Between two on-premises VMM sites
Between an on-premises Hyper-V site and Azure
Between two on-premises VMware sites
Between two on-premises VMM sites with SAN array replication

In this example, we use Between an on-premises VMM site and Azure (Figure 2). You can download the agent required for VMM from the dashboard – 7MB are unlikely to be an obstacle.

Figure 2: Scenarios supported by Azure Site Recovery – An overview.

The agent installer looks for the required certificate and the vault name. You can generate and download a registry key directly in the Azure management portal. During the installation, the VMM service is briefly stopped, but it is back up again after a few minutes. After successfully completing the installation, you have now created a vault in Azure that is linked to the local SCVMM.

The next step would thus be protecting a local cloud. To do this, you need to open the SCVMM cloud properties in the SCVMM management console and select the cloud for protection by ASR. This gives you a granular approach to choosing which information to replicate to Azure at any time. You need to assign ASR an Azure storage account for the failover to Microsoft Azure; this is where the virtual machines will be stored later on. In the Azure management portal, ASR points out that components are missing, but you can integrate them with a single click on the message.

You need to create a virtual network for later communication in Azure – this allows the virtual machines to continue communicating with the local site in the case of disaster. This step is not integrated with SCVMM; you will need to create the network via the Azure management portal. To do so, go to the Networks menu item and create a new virtual network (VNet). A simple virtual network is fine for the time being.

The configuration includes, for example, setting up a site-to-site VPN connection between Azure and a local network. The VPN ensures a transparent connection to the virtual machine, although it is operated at a totally different location after the failover. Later, you can connect the virtual networks with virtual networks from SCVMM, to be able to assign the right networks to the virtual machines that have failed over.

The last step is to create the recovery plan (Figure 3). This is where you define when to start which virtual machine and what scripts or other input are required. This function is very powerful and is best handled in the scope of a brief brainstorming session. For example, in what order do the virtual machines start? Which virtual machines? Do you want to start a virtual machine in the cloud with a different configuration than in your own data center? You will find a more comprehensive step-by-step guide online [1].

Disaster Recovery Scenario for SMEs

The solution described above is primarily designed for enterprises that already operate one or two data centers and rely on Hyper-V (and System Center) for their management. As you can see from the architecture described here, the installed agent communicates via the System Center Virtual Machine Manager with the Hyper-V host.

It often does not make business sense for small to medium-sized enterprises to run a second data center. This is where a new scenario offered by Microsoft comes into its own; it is designed for customers who do not have a second data center and who do not operate System Center. It supports operations with Azure Site Recovery without purchasing and running Virtual Machine Manager.

Of course, Microsoft expects you to pay them for operating the hub in the cloud – after all, this is an enterprise-level service with corresponding Service Level Agreements. Microsoft differentiates pricing between where the virtual machines are provisioned online in case of failure – monitoring and supplying a virtual machine with agents costs the same for all models. Virtual machines that are resuscitated in the secondary data center in case of an outage cost slightly more than EUR10 per month and virtual machine.

If you choose Azure IaaS as the failover location, the costs are slightly more, according to the catalog – this includes geo-redundancy. According to the price list, this means that 20 virtual machines, of which 10 are restored to a secondary data center, while another 10 are recovered in Azure, will set you back slightly more than EUR500 per month (of course, prices may vary depending on your contractual agreement with Microsoft). Compared with a solution that you program and maintain in-house, this can be an attractive solution, especially if only the critical workloads are configured for fast failover, and you do not have your own data center available for that function.

Integrating Storage

ASR lets you integrate existing configurations for storage replication at the hardware level into your failover plan. This means that you can design even better and more competitive failover plans. To integrate your storage infrastructure into Azure Site Recovery, you need SCVMM, which relies on the SMIS (Storage Management Initiative Specification) to communicate with your storage hardware. In this context, Microsoft has sought support from its partners, including EMC, Hitachi, HP, and NetApp – to be able to integrate this third-party hardware including recovery as a scenario for "Storage Replication" based on SMIS in ASR.

The hardware failover configuration is also managed via SCVMM, which means you can integrate your enterprise storage directly into your failover plan – thus rounding off the solution.

Hypervisors and Physical Hardware

Microsoft also supports physical hardware with its site recovery offerings. Physical servers are monitored and replicated into the cloud by agents. In case of failure, the physical server continues to run in the cloud. This is not the typical ASR agent used with SCVMM or the Hyper-V host; instead, it comes from the InMage software suite, whose components are gradually being added to the Azure portfolio since its acquisition by Microsoft.

To use the service, download InMage via the Azure Site Recovery portal by pressing the Setup Recovery button on the Quick Start page and then selecting Between two on-premises VMware Sites from the list. You again use the InMage suite for this. The software package lets you manage the recovery and replication of two VMware implementations beyond the boundaries of your data center; it ensures encrypted, compressed transmission of the virtual machines in doing so.

The InMage Scout component of the suite helps you replicate and provide failover protection for physical servers. The software runs on supported operating systems and replicates changes to a target system. The cloud, the underlying hardware, or the hypervisor are not decisive here. This means that the system can run on physical hardware, VMware, Hyper-V, Azure, or AWS. Scout is only interested in changes to the operating system, which replicates on the target system – and this can reside in Azure. In addition to VMware-to-VMWare, InMage Scout also supports Anything-to-Azure.

Conclusions

ASR can be a huge help in automating the process of failover and virtual machine replication at a second site. The backup site can be your own data center based on Hyper-V or VMware, but it can also be Azure as a preferred scenario. Microsoft will continue to integrate InMage into its overall ASR offerings over the course of time. Additionally, ASR already offers some attractive features.