Logical Design: the second of a four-part series on automation and orchestration architecture, looking at DR test as a service.
In part 1, Problem Definition and Conceptual Design, we began the design process an automation architect would undergo when attempting to deliver a new service. The design focuses around improving the application DR testing process by automating the steps involved and presenting a service to developers. Now that we have covered the problem statement and conceptual design, let’s dig into the logical design.
Prerequisites and Current State Analysis
Currently Ahead Aviation uses VMware vCenter Site Recovery Manager (SRM) for disaster recovery and DR Testing. SRM provides a means of automating recovery of an application by grouping relevant virtual machines into a protection group, and predefining a recovery plan consisting of a series of steps to take during a recovery. This order of operations plan allows an engineer to define how virtual machines should be restored to meet dependencies based on infrastructure needs or application tiers. Additionally, separate steps may be taken for a DR test versus a true disaster event.
One of the largest constraints faced in using SRM is that all participating servers must be virtual. SRM does not support disaster recovery of physical servers at this time. We will therefore require any application tested through this methodology to be 100% virtual.
To reduce cost and complexity related to teaching staff new skills, we will continue to leverage VMware vCenter SRM for DR and DR testing needs. To build automation and self-service into this process we will need tools that integrate through the SRM API. Recently VMware added support into vRealize Orchestrator that allows API calls to vCenter SRM. In turn, vRealize Automation (formerly vCloud Automation Center) enables “Anything as a Service (XaaS)” through the use of vRealize Orchestrator. Therefore the following tools will enable this self-service DR testing design:
- Service Portal: vRealize Automation
- Orchestration Engine: vRealize Orchestrator
- Disaster Recovery and DR Testing: vCenter Site Recovery Manager (SRM)
Request Form Inputs
As seen in the logical diagram, the catalog item will require just a few input variables from the requestor. The only truly necessary variables will be which application to test and how long the test should last. We can even populate a default test length of 5 days or something similar. The user contact details can be gathered from session variables; however, we may want to include space for additional test/dev team e-mails to be entered. The output of workflow can provide these users with instructions for accessing their test and any other pertinent details.
Note that the user is not encumbered with having to enter details about which hardware to use, where the test should be located, which individual servers need copying, or other such details. The simplicity of requesting a DR test is another powerful advantage of automating the testing process as a service.
Once the user requests a DR test and enters the application and duration, a request is sent to vRealize Orchestrator server. In reality the workflow itself is exposed to vRealize Automation via the advanced service designer, so vRealize Orchestrator receives the request directly. (The specific design of this workflow will be covered in the next part of this series.) As a bonus we can also interactively query a list of potential test applications directly from SRM. We assume architects will build a recovery plan for each application ahead of time. By querying the list of recovery plans directly from SRM we can assemble a list of possible test candidates.
After any applicable error checking, the workflow will instruct vCenter SRM Server to execute the test capability of the given recovery plan. From this point SRM will create the logical test segment using the pre-ordained network and compute resources. The environment will be fenced off from production so it will not be possible to interfere or cause issues.
A similar workflow will exist to tear the environment down once the test duration has elapsed. It will be up to the user to request an extension if required.
Coming Up Next
In this post we looked at the logical design for our DR Test as a Service workflow in more detail. The existing tool used for DR and DR Testing is VMware vCenter Site Recovery Manager. To add the automation functionality we will leverage vRealize Orchestrator as the orchestration engine and vRealize Automation as the service catalog and workflow manager. In addition, vRealize Automation provides access control and governance.
In part 3 of this series we will look in detail at each workflow component in the design. We will explain design considerations for the request form and catalog item in vRealize Automation. Moving down the stack we will dissect each component of the vRealize Orchestrator workflow involved. Finally, we will show how vCenter SRM is built to respond to requests from vRealize Orchestrator. In the final post of the series we will do a live lab demo of Ahead Aviation’s DR Test as a Service workflow.