Article

How to Achieve High Availability with SRDF/Metro

As announced at EMC World 2015 and delivered in HYPERMAX OS SR Q32015, EMC will add some major functionality in its enterprise behemoth, including one I’d like to talk about: SRDF/Metro. Some of you may already know where this is going based on the product name, but for those of you who don’t, I’ll just dive right in.

SRDF/Metro is an enhancement to the Symmetrix’s rock-solid remote replication package, which is inspired by EMC’s block virtualization platform, VPLEX. In traditional SRDF configurations, the R1 (source) volumes are read/write, but the R2 (destination) volumes are write-disabled. This is to preserve the remote copy, so that data doesn’t get corrupted if the volume is written-to at both sides. With the new SRDF/Metro feature, R2 devices on VMAX3 are now read/write accessible to hosts along with the R1 devices. This means you can now natively support an active/active data center pair with only VMAX3 (no VPLEX required)!

If you’re still following me, there are a couple of scenarios you could utilize this for:

Scenario One: Multi-site Clustering

First is a “stretched” VMware vSphere cluster. This is one of the selling points of VPLEX too. You could extend your VMware cluster across sites to be highly available in case a site goes black. You get the same data, accessible at two sites, on two arrays, fully synchronized. It’s a pretty compelling scenario. VMware is probably the most common use case, but you can use many clustering applications like Microsoft Failover Clustering or Veritas Clustering Services the same way.

Scenario Two: Mission Critical Apps with High Availability

Another use case is for mission critical, tier zero apps that require the highest availability possible. You can setup your SRDF/Metro pairs on two VMAX3 arrays, then zone in your host to both arrays and access the data in both places, like a regular multipath volume but on two different arrays. This is probably overkill in most situations, but works for those special cases.

But How Does it Work?

Writes to the R1 or R2 devices are synchronously copied to the device’s pair. If a write conflict occurs, SRDF/Metro software resolves it to maintain consistent images on each of the SRDF pairs. The R2 device will assume all the traits of the R1 device including its geometry, device WWN, etc. To the host, it will appear as a single virtualized device.

I bet you’re asking yourself, “Well, what if something happens to a site?” Well, EMC has addressed this in a couple of ways. When the SRDF link becomes not-ready by either a link failure or a site failure, SRDF/Metro has to choose a side to keep production on, which is called a bias. When the CreatePair operation is performed, the R1 is the bias side. If the pair become not-ready now, the source R1 device remains accessible. The R2 side is made inaccessible to hosts. The bias can be changed after the pair state is changed from regular SRDF/S to SRDF/Metro. If you want to favor the R2 side in the event of a link failure, you can do that, but the devices must first be in the new active/active pair state.

Another option to determine the winner in a failure scenario is to setup a third VMAX or VMAX3 as a remote quorum. This third array monitors SRDF and links on each array. In the event of a failure, the quorum can determine who failed and which side should be accessible to hosts. This mimics the behavior of a VPLEX Metro configuration with the third site witness component. Since I’m sure you all will have VMAXs laying around to do this with, this should be the go-to configuration.

I personally think SRDF/Metro is a great addition to the SRDF family of replication. If you’re going to have a 6-nines array, you don’t have to complicate your infrastructure with VPLEX just to be active/active. Native VMAX3 active/active SRDF is now just a code-upgrade away.