As a technical architect at AHEAD, one of my responsibilities is to be a subject matter expert on converged infrastructure (CI) and hyperconverged infrastructure(HCI). Over the past few months, I’ve been evaluating various new and/or popular products in the marketplace and have been relaying my findings in the AHEAD blog. This post will complete this series which has included two products from Nutanix, Cisco HyperFlex, VCE Vblock, and Hitachi UCP. That being said, I’m very excited for the next series to launch where I’ll be working with the AHEAD team to perform a series of tests against a subset of these solutions in our lab (including Nutanix, HyperFlex, and VxRail). This post will be discussing my experience with EMC, VCE, and Cisco’s joint collaboration, VxRail.
Company Intro and CI History
EMC was founded in 1979 and has since grown to become a leader in storage, virtualization, information security, cloud computing, and analytics. Through the company’s history, it’s been able to grow through successful acquisitions, with some of the more recent ones including XtremIO, DataDomain, Avamar and RSA Security.
Until 2015, EMC had not been directly involved in the server/compute market except through VCE (where they resold Cisco UCS through Vblocks and VxBlocks). In 2015, EMC released VSPEX Blue, which was the first EMC branded hyperconverged appliance that incorporated storage and compute elements in a single device. Unlike Vblocks, the appliance was solely branded as an EMC product, with the actual hardware being manufactured under the covers by Quanta.
VSPEX Blue was part of the EVO:RAIL program developed by VMware, where they documented specifications for an HCI appliance powered by vSAN. VMware invited established manufacturers to build the appliances, allowing them to add their own features as long as they followed VMware’s primary specifications. Eight manufactures joined, including EMC, Dell, NetApp, Hitachi and others. The result was not great, although vSAN was fairly developed by then and the manufacturers were very experienced, and the resulting products did not have the core benefits expected from HCI, namely, simplified management, scale-by-node architecture, or significant compute/storage density.
The Evolution of VxRail
VMware and EMC went back to the drawing board, this time without the other manufacturers, and co-developed a new iteration of EVO:RAIL called VxRail. Unlike the prior iteration, this solution is a focused effort of EMC, VCE, and VMware and is not expected to include any other partners. As this develops, VMware will continue to develop vSAN and EMC will provide the hardware platform and management software. One of the benefits of this new relationship is tighter integration between development teams that should result in a faster release schedule of new features.
VxRail falls neatly in the category of HCI. For those who don’t know or don’t remember exactly what that means, AHEAD defines HCI as “A combination of server, storage, and hypervisor into a node-based architecture with node-based scalability leveraging software defined storage and providing simplified management with advanced infrastructure automation.”
VxRail is quantified in terms of appliances and nodes. Through 2016, a minimum VxRail configuration will require 4 nodes sold as a single appliance. Each appliance is composed of a 2RU chassis with shared power supplies and 4 slots for nodes. Starting in May, customers will be able to grow VxRail clusters in increments of 1 node at a time (assuming that they already own 1 appliance that contains 4 nodes).
Use cases for VxRail are similar to those for other HCI solutions. Beyond the expected fit with VDI and general virtual workloads, EMC and VMware did an excellent job at bundling licensing to make VxRail a very appealing product for remote locations. As you will read later in this post, you can arguably cover most of your bases for a remote office with a VxRail appliance without having to add third-party backup and replication software.
Architecture and Features
- VxRail offers two major node types: hybrid and all-flash. The hybrid nodes have (1) SSD for read/write cache and between 3 to 5 SAS drives and the all-flash nodes have (1) SSD for write cache along with 3 to 5 SSD for the capacity tier.
- The product can scale up to several thousands of VMs on a fully loaded cluster (64 nodes) w/ 640 TB of usable storage, 32TB of RAM, and 1280 compute cores (hybrid node-based cluster), with the all-flash models supporting significantly more storage.
- Workloads where this is not a good fit include those that require less than 5ms latency, those that require more SSD capacity than the available cache in the system, data sets that are not dedup or compression friendly (after vSAN 6.2 / VxRail 3.5 for AF), or mission-critical applications (this is still a 1.0 product).
- The common argument against HCI is that you cannot scale storage and compute independently. Currently, Nutanix can actually do half of this by adding storage-only nodes, but this is not always a solution for IO heavy workloads. HyperFlex can do the other half by adding compute-only nodes, which does not add unnecessary storage costs.
- vSAN currently does not support storage-only nodes in the sense that all nodes participating in vSAN must run vSphere. vSAN does support compute-only nodes, so VxRail could arguably release a supported compute-only option in the future.
- VxRail will serve virtual workloads running on VMware vSphere. There are no plans for this product to run any other hypervisor or support bare metal.
- VxRail has (4) models for the hybrid type and (5) for the all-flash version. Each version represents a specific Intel processor and each option offers limited customization (limited RAM increments and 3-5 SAS drives of the same size). In the VxRail 3.5 release (shipping in June), you will be able to use 1.2 or 2 TB SAS drives.
- You will be able to mix different types of hybrid nodes or different types of all-flash nodes in a single cluster as long as they are identical within each 4 node enclosure. For example, you can’t have a VxRail 160 appliance (4 nodes) with 512 GB of RAM and 4 drives and then add a second VxRail 120 appliance with 256 GB and 5 drives. This isn’t recommended from a VSAN perspective, as it would result in storage and performance imbalance.
- VxRail currently does not include any native or third-party encryption tools. This feature is in the roadmap.
- VxRail model types define the type of Intel CPU that they contain, with the VxRail 60 being the only appliance that has single-socket nodes. The bigger the VxRail number, the bigger the number of cores in the Intel E5 processor. The processor options are:
- Hybrid nodes: E5-2603 v3 (6 cores) to the E5-2660 v3 (10 cores) per socket
- AFA nodes: E5-2620 v3 (6 cores) to the E5-2683 v3 (14 cores) per socket
- Memory options range from 64 GB to 512 GB per node, with the 64GB option available only on the VxRail 60 model. There are currently no compute-only VxRail options, although technically nothing will stop you from adding compute-only nodes into the mix, except that might affect your support experience.
- Although there are currently no graphics acceleration card options for VDI, we expect them to be released in a future version later in 2017.
- This is where HCI products shine. There is no dedicated storage array. Instead, storage is clustered across nodes in a redundant manner and presented back to each node; in this case via VMware vSAN.
- VMware vSAN has been around since 2011 (previously known as VSA) when it had a reputation of not being a great product, especially for enterprise customers. Since then, vSAN has been nurtured by VMware to become a viable product.
- The current VxRail version (VxRail 3) runs on vSAN 6.1 and the soon-to-be-released VxRail 3.5 is expected to run vSAN 6.2.
- There is a significant amount of both official and non-official documentation on vSAN available for you to check out, but in summary, local disks on each VxRail node are aggregated and clustered together through vSAN software that runs in the kernel in vSphere. This clustered storage is then presented back to each VxRail node using proprietary protocols. The nodes gain the same benefits that you would expect from a traditional storage array (VMware VMotion, storage VMotion, etc), except that there actually isn’t an array or a SAN that needs to be managed.
- Although I have seen several customers invest in vSAN, alongside their preferred server vendor to create vSphere clusters for small offices or specific workloads, I have not seen significant data centers powered by vSAN. I suspect that part of the reason for this is at that scale, vSAN hasn’t made economic sense either from a capital investment or from a fuzzy operational efficiency perspective. I say “fuzzy” because it hasn’t been clear whether a large vSAN deployment is actually easier to manage than a traditional compute + SAN + storage array.
- However, things change when vSAN is incorporated into an HCI product that can simplify operations and leverage economies of scale by focusing R&D, manufacturing, documentation, and a support team onto an appliance.
- Compared to other HCI solutions, vSAN does have the benefit of running in the vSphere kernel, where we expect potential performance results. More importantly, not having a virtual machine that runs a virtual storage controller means that there is one less thing for someone to accidentally break.
- VxRail leverages a pair of 10GB ports per node that are connected to 10GB switch ports using Twinax, fiber optic, or Cat6 depending on which node configuration you order.
- VxRail does not have any hard network requirements, but it is best practice to avoid using 7k/5k/2k designs and instead use TOR switches.
- Any major 10G capable switches can be used as explained earlier, and even 1G can be used for the VxRail 60 nodes (4 ports per node).
Data Protection, Replication, High Availability
- VxRail uses failures to tolerate (FTT) in a similar fashion to Nutanix or HyperFlex’s replication factor (RF). An FTT of 1 is similar to RF2, where you can lose a single disk/node and still be up and running. vSAN 6.2 can support a maximum FTT setting of 3, equating to RF5, which doesn’t exist on Nutanix or HyperFlex. More importantly, vSAN allows you to use storage policies to set your FTT on a per-VM basis if need be.
- As mentioned above, FTT settings address data durability within a VxRail cluster. In contrast, the included Vmware Data Protection (VDP) licenses for up to 8TB of front-end capacity provide remote data durability. This license allows customers to back up their datasets locally, such as to storage inside VxRail, on a data domain, or on another external storage device, and then replicate it to a remote VDP appliance. It’s not a fully-fledged enterprise backup solution, but it could be sufficient enough for a remote or small office.
- VxRail provides replication options through RecoverPoint for VM. Licensing to replicate up to 15 VMs is included in the appliance, which enables customers to replicate their VMs to any VMware-based infrastructure in a remote location (assuming that the remote site is running the same or older version of vSphere).
Highly Available Solutions
- vSAN stretched clusters allow organizations to create an active-active data center between VxRail appliances. This feature requires the expected high bandwidth (10GB) and low latency (<5ms) network links between sites that are typically not seen between large data centers and small offices. With that said, it’s nice to have the option, especially if the AFA version is widely adopted within the data center.
- VxRail is expected to only support vSphere, since it is based on VSAN.
- VxRail Manager provides basic resource consumption and capacity data along with hardware health. It can replace vCenter for some functions and especially for individuals who aren’t used to working in vCenter.
- VMware vCenter works as expected; there are no VxRail-specific plugins added or customizations needed.
- VMware Log Insight aggregates detailed logs from vSphere hosts. It is a log aggregator that provides significant visibility into the performance and events in the environment.
- Although most of your time will be spent in vCenter, there are a few additional management interfaces that you have to log into.
- VxRail Manager – EVO:RAIL Manager has been replaced by VxRail ManagerThis is your central point of management outside of vCenter.
- This provides basic health and capacity information.
- This allows you to perform a subset of vCenter tasks (provision, clone, open console).
- VxRail Extension – VSPEX Blue Manager has been replaced by VxRail ExtensionThis allows for EMC support to interact with the appliance.
- This allows for chat with support.
- This allows for ESRS heart beats (call home heart beats back to EMC support).
- This provides event history and overall system history.
- This provides a physical diagram of all components, including serial numbers, and walks you through swapping failed hardware.
- vRealize Log Insight
- This aggregates and categorizes events from the vSphere hosts.
- Customers can download RPM packages that include newer versions of vSphere, VSAN, vCenter, and VxRail. It updates everything except for the VxRail Extension, which is updated separately. The wizard uses the RPM package to automatically upgrade all of these components without downtime, assuming that you don’t have affinity rules.
- All other packages, such as RecoverPoint for VM, VMware Data Protection, and CloudArray are updated separately.
Monitoring & Alerting
- VxRail Manager can be configured with call-home functionality provided by ESRS which is automatically deployed via a wizard.
- This allows the tool to send alerts directly to EMC that can expedite support visits.
- Beyond ESRS, other alerting would be configured through vCenter.
- There is no current integration between VxRail manager and vCenter in the form of plugins.
- With that said, there doesn’t seem to be any major holes between the management tools.
- I appreciate how you can see detailed hardware on each node through vCenter.
EMC has refocused much needed attention into this product line with the majority of benefits expected (but not yet seen) on VxRail 3.5 and beyond. Initial discussions with customers indicate that there is very strong interest in the product and that is likely the case do to the improved configure-to-order options, more competitive pricing, and the fact that it is part of the greater EMC ecosystem. My initial experience with it in the AHEAD Lab and Briefing Center was very positive; I had no issues deploying the appliance and the automated deployment wizards were flawless.
I’m excited about the product for the same reasons, but also because it has an aggressive and compelling roadmap. Nowadays, anyone can write up a beautiful roadmap, but the key will be if EMC and VMware are able to deliver on it. Many of the features that I am excited to see are addressing things that I consider to be weaknesses, such as improved management tools, integration/consolidation, and additional data services that are not fully developed in the current version of VxRail.
The VxRail 3.5 release in June will be an important event for the product and for EMC. If successful, it will establish itself as a strong contender in the HCI space and will formally clear much of the remaining uncertainty caused by its predecessor, VSPEX Blue.
Here are some follow-up points that I hope to address over the next few months:
- Overall manageability – VxRail Manager (EVO:RAIL Manager) provides a simplified GUI for customers, with the benefit being that it empowers staff with limited experience to manage the solution. This is useful for ROBOs and smaller IT shops, but it really isn’t that beneficial for larger companies that want to consider using the platform for larger workloads. Curious to see how VxRail snaps into other EMC management tools (Vision) for larger customers.
- Dedup ratios – This is specific to VxRail 3.5 and the all-flash nodes. I am looking forward to seeing performance and capacity numbers once we enable deduplication and compression. The AF version has amazing density potential on the storage front, with 38 TB of usable SSD capacity before dedup and compression in a 2U form factor.
- Flexibility – I’m unsure on when/if we will be able to mix and match nodes in a fully supported fashion, at least within the AF or hybrid families. Not doing so means that it will be difficult to adjust compute/memory/storage ratios as workloads change over time without building new clusters.