Article

Cloud Field Day 1: A Review of Druva Software

A couple of weeks ago, I was asked to participate as a delegate for Gestalt IT’s Cloud Field Day 1. While I was there, I had the opportunity to meet with representatives from Druva Software, as they were sponsoring the event (check out Druva’s presentation videos from Cloud Field Day 1 here). The timing couldn’t have been better as Druva was on my list of companies to investigate as part of an ongoing initiative at AHEAD to help clients leverage AWS storage services. Beginning with Druva in this post, I’ll be doing a blog series on each of the four event sponsors (others include Cisco, Scality, and Docker), introducing readers to their product lines and initial thoughts I have about these solutions.

About Druva Software


druva-logo-300Druva Software is focused on backup, recovery, and DRaaS. Druva was founded in 2008 and is headquartered in Sunnyvale, California. The founders came from Veritas Technologies and therefore understand this market and are working to try and cause disruption by offering non-traditional cloud-based backup. Druva has recently obtained its fifth round of funding from investors like Sequoia Capital, Nexus Venture Partners, Indian Angel Network, and EMC Ventures.

Druva’s Architecture

There are two main products offered by Druva: inSync and Phoenix. inSync is made for endpoint data protection and Phoenix is for server and virtual machine backup. Both of these products share a lot of the same underlying architecture and functionality, but also have many differences.

Features

Druva’s backup offerings differ from traditional backup/recovery tools but may be more familiar to those who have used EMC Avamar. Both are sold as SaaS solutions and their target storage providers are AWS and Microsoft Azure. The primary features of Druva’s products include:

  • Separation of data and metadata
  • Native global deduplication (per client)
  • Native global search (per client)
  • Time-indexed atomic snapshots
  • Integrity checking of the storage
  • Fast reassembly of blocks for restore from any point

Functionality

While these products aren’t necessarily earth-shattering or revolutionary, they do provide a solid offering. Deduplication is handled on the source side and uses a hashing mechanism. In this way, only changed data blocks are sent, which greatly reduces bandwidth and backup times. The separation of data and metadata is important as it allows you to do full indexing and search very quickly. Druva’s backup/restore is similar to other products in this space in that it’s doing source-side deduplication based on a hash. Therefore, you never do a full backup after the initial data ingest. Restores are built from the incrementals very quickly.

Metadata is stored in Amazon DynamoDB or PolarisDB (Azure). It consists of file and block hashes along with the S3/Glacier/Azure BLOB Storage references. Actual backup data is stored on S3/S3IA and Glacier when using Amazon and Azure BLOB Storage on Azure. The blocks are variable length and encrypted with AES256. Customers create their own keys and Druva has no access to customer data. Because of this, it is important to protect those keys as they can not be recovered for you.

One problem that cloud-based backup offerings have struggled with is performance, especially on a restore; bandwidth can quickly become a bottleneck. Druva works to resolve this by using an on-premises caching system called CloudCache.

CloudCache is a virtual machine you deploy at your site that provides a fast, local caching system. Given that most restores usually only go back a week or two, this provides a good compromise so that you aren’t regularly pulling data back down from AWS or Azure. The size of the cache is up to the user and the amount of storage they throw at the virtual machine.

Given that most restores happen within a week or two of data being backed up, it makes sense to tier where the backup sets are kept. Druva handles this on a set schedule within AWS.

Currently, the user has no ability to change the dates to waterfall data to different tiers. While this might just be engineering work that is yet to be done, it is also likely due to cost. Druva is priced per-user (inSync) or post-deduplication capacity (Phoenix). If a customer decided to keep 180 days in S3, then the cost to Druva would go up, but the customer cost remain the same.

inSync

As mentioned earlier, inSync is Druva’s endpoint backup solution. To be honest, backup isn’t the most exciting thing to talk about, even if it is to the cloud, but there are some interesting features within inSync. You can push agents out to the endpoints using tools such as GPOs or SCCM. Supported platforms include:

  • Windows
  • OSX
  • Linux
  • Android
  • iOS

inSync supports the standard platform offerings where users will get the benefit of source-side deduplication, so once the initial backup is done, the incrementals are much smaller. Also, if users share documents and data, they’ll have the benefit of that deduplication after inSync first sees the data as the deduplication is global for the customer.

Users get as much or as little power as they set for changing the behavior of their device backups. They can choose to exclude directories or back up everything and they have the ability to do a remote wipe of devices. Users can also specify when their system should be backed up, or more importantly, when they shouldn’t. There are times you really don’t want your system to perform a backup, like when you’re on slow Wi-Fi or tethered to your phone with a data plan.

Cloud Services Backup

Along with endpoints, inSync allows you to backup many cloud services. These include:

  • Box
  • Office365
  • Sharepoint
  • Google applications
  • Salesforce.com

These are done using cloud-to-cloud APIs and will protect you from accidental or intentional deletion of data. This is becoming very important for users as, like me, they rarely store data outside of these cloud services.

Data Preservation and Compliance

Often an area of pain for administrators is data preservation for legal holds and eDiscovery. This is one area where inSync might be helpful and go above and beyond what other backup solutions offer. A benefit of having your backup set all in one place and easily indexed and searched is the ability to gather and pull out what you need, when you need it.

Through the UI, administrators can manage legal holds on data and have full auditing and chain of custody reporting. They can also cull data so that they only get what they are looking for, reducing transfer size.

inSync provides a restful API for the legal hold functions. This lets third parties automate many of the functions. Two examples of these third-party partners are Exterro and Zapproved.

Compliance is another area where inSync is unique. It can leverage its full indexing and search to look for sensitive data within files and alert you if an endpoint had something that it shouldn’t. These could include:

  • SSN
  • Driver’s license number
  • Credit card numbers
  • Drug names
  • Medical records

You can build any sort of query that you might want or need. inSync comes with a set of templates to use and you can extend them with keyword or regular expression searches. These tools are not “magic bullets” but can be very useful. They just require the administrator to spend time building the right queries for the correct systems.

Thoughts on InSync

Endpoint backup has always been a bit messy with agents, connectivity, and storing redundant data. For these reasons, inSync seems to be a good option as it’s simple to deploy and there is virtually no infrastructure to manage for administrators. The legal hold and compliance tie ins are very compelling for many people.

What interests me is the API-driven cloud backup for user data. Normally, endpoint backup tools are going to back up the synced copy that a user has on their notebook. But often times, that may not be the full data set and could lead to holes in the backup strategy.

inSync is licensed by the user, which is easy. Even though the data is kept on AWS or Azure (this can be chosen when signing up), there is no billing or payment to the cloud provider; it’s all to Druva. But, by leveraging the cloud, it really does simplify the remote user backup problem and it reduces strain on the network between sites and VPN users.

(Note that I have not actually tested Druva to confirm the functionality or capabilities. All of my findings are coming from the presentation and demonstrations.)

Phoenix

Phoenix is Druva Software’s product for server backup. It uses the underlying architecture we described earlier and is similar to inSync, but focuses on servers and doesn’t have the legal hold and compliance pieces. The advantages are similar as well. Everything is stored using public cloud storage services and backups are source-side deduplicated and fully indexed.

Phoenix currently supports the following operating systems and platforms:

  • VMware vSphere
  • Windows Server
  • RedHat Linux
  • CentOS Linux
  • Microsoft SQL Server

Virtual machine backups on vSphere are done using the built-in VMware APIs whereas the others listed use agents. One thing to keep in mind is that Druva does not position Phoenix for everything. It is targeted at remote sites and small-to-mid data centers and is not targeting high transaction applications right now. For these types of applications, you may want to consider another option and run in parallel.

In my opinion, there isn’t much more to note about Phoenix. That isn’t because it’s not a robust product, it’s just redundant with what you’ve read before about the architecture and inSync. Licensing is different with Phoenix and instead of it being on a per-user or per-server basis, it is by post-deduplicated capacity. Therefore, the better deduplication you get, the lower your monthly bill will be.  

Phoenix does have a DRaaS option for AWS. This is still early and is something I really want to test out. Druva claims that they will convert and spin up backed up VMs in EC2. There is a larger discussion here on how you recover back, configure network connectivity, and allow application data access, but it’s a good start and something I’m interested in learning more about.

Thoughts on Phoenix

Phoenix is very straightforward and is a viable option for those with remote sites. Traditionally, cloud backup has not been a great solution due to performance but, like inSync, Phoenix supports the use of caching systems. The licensing is easy to understand and I think many organizations would love to get out of the data backup and archive storage business.

But… can Druva compete with other new offerings such as Rubrik and Cohesity? I think this is the real question. These systems offer greatly simplified backup and restore using a scale-out node-based architecture. They allow users to tier to public clouds as well. Along with that, there are options for fast spin-up of systems for quick recovery as well as secondary storage options, neither of which Phoenix can offer.

Organizations looking to modernize their backup systems will need to really weigh their requirements for performance, physical space, public cloud adoption, RTO, and preferred consumption model as these can greatly affect the chosen platform. If you’re interested in learning more about Druva’s backup offerings or how AHEAD can help you weigh your options, schedule a time to meet with our experts in the AHEAD Lab and Briefing Center today!