Cloud Field Day 1: A Review of Scality – Scalable Enterprise Object Storage

This post continues the series of technologies that were presented to the delegates at Gestalt IT’s Cloud Field Day 1. In my last post, we discussed Druva and its cloud-based backup applications.

About Scality

Scality is not a new player in the software-defined storage arena. It was founded in 2009 and is headquartered in San Francisco, California. Scality has received five rounds of funding for a total of $92M with its last round coming in August of 2015. Primary investors include Galileo Partners, Menlo Ventures, Iris Capital, and Idinvest Partners.

Scality ranked very well in the latest Magic Quadrant for Distributed File Systems and Object Storage.  You can see the chart on Scality’s website here.

Scality RING

RING is Scality’s highly-scalable storage platform. It should be considered a software-defined storage system, though that term is being used very loosely these days. Scality does not provide any sort of appliance or hardware. That’s up to the user to choose. Scality has some OEM relationships with certain vendors, including HP and Dell, but Scality itself does not deliver or support the platform in an appliance-type package.

Scality-Nash1

RING can be deployed on many common Linux distributions.  These include CentOS, RedHat Enterprise, Ubuntu, and Debian. One thing that I like is that it does not run in kernel space. There are no kernel dependencies or changes. Therefore, there is not a complex compatibility list for them, or you, to worry about. The hardware can be spec’d out how you like with all HDDs, all flash, or a mix of both. Networking can be 1Gb or 10Gb and the vendor is up to the user.

What defines and differentiates RING is the linear scaling and resiliency. There are no single points of failure nor is there a single database or system for metadata. It uses a distributed system for indexing. Access is fully parallel which lets performance scale as the system does.

Architecture

You can think of RING as having three “layers” in the architecture.

Scality-Nash2

The top layer is the scale-out access layer. This is implemented using a concept of connectors. Applications and users store and retrieve data through these connectors.  The connectors currently supported include:

  • S3-Compatible
  • NFS v3
  • SMB 2.0
  • Linux FS (FUSE)
  • SWIFT
  • CINDER
  • GLANCE
  • Scality REST API

The connectors send or retrieve data directly from the storage nodes. Data can be written in maximum-sized chunks so that the objects get spread over multiple nodes.

The second layer is also responsible for local and geo-file/object replicas, of which there can be five.  The system supports Erasure Coding for better capacity efficiency.  This is where all self-healing occurs for the system, should a disk or node fail.

The third layer is focused on storage and a distributed object key/value store. This is based on the Chord, a distributed peer-to-peer hash table protocol developed at MIT. 

This is just a very high-level overview of the components of RING. If you would like to dig in to the details of the system, how it distributes data, checks for errors, and self-heals, check out Scality’s technical white paper available here. It should answer any questions you have about the underlying architecture.

Management

RING has a good overall monitoring and dashboard system. It allows you to manage the overall system including nodes, health, stats, metrics, performance, and other resources. 
Scality-Nash3

Use Cases

We’re seeing more entrants into the SDS space. Some provide simple block storage or NAS, while others, like Scality, are heavily focusing on object-based storage. RING is targeted at the multi-PB use case. Yes, you can absolutely start small, but the system is designed and built to scale out to very large deployments. Unlike other systems, it is also tuned to work well with large amounts of small files, which has been a challenge for other vendors when trying to solve this problem.

While Scality RING supports NFS, it was really built for object-based storage protocols. My thought is that NFS is there for those migrating to applications that support true object storage. I would not recommend replacing large NAS systems with significant random access with something like a Scality RING deployment. Those are very different use cases. We have seen that most enterprises are looking or in the testing phase for object-based storage. Those using it directly are currently the exception. Supporting NFS allows people to migrate in to RING.

The adoption of object-based storage has been slow for most of the enterprise market. But, if you have an application that needs it, you most likely really need it and it needs to be scalable. This usually comes down to specific applications and often specific industries.

Backup

Backing is a very common use case. Given that Scality provides a S3-compatible front-end interface, it will work with any backup platform that can push data to it. This is a good solution for those that need to backup large amounts of data but want a simpler approach. For example, you could use a next-generation backup platform such as Rubrik or Cohesity and keep the primary data on the backup system cluster but push all archival data to a large RING deployment. That way you get simplicity and scalability.

Media and Entertainment

One thing that became very apparent during the CFD1 meeting with Scality was that its primary customer base was in entertainment and media. This isn’t surprising since that industry often deals with large files and building a platform scalable enough with the ability for quick access has been a challenge.

Healthcare

Healthcare, not surprisingly, consumes a lot of storage. One big consumer is PACS, picture archiving and communication system. PACS systems are made by several vendors but their purpose is to store medical imaging and over the years, there has been a lot of innovation in this space. That innovation has given us higher detail and more capabilities with the systems, but those usually create much larger files which must be stored for a long time.

We are starting to see that more of these systems can utilize an S3-based storage system. RING could be an excellent target for this, assuming the healthcare organization wanted to go to a built SDS-type solution instead of a PACS-vendor solution. Having said that, we are seeing more innovation in healthcare and getting requests to go outside the old norms for storage.

Open Source S3 Server

Scality offers a free open-source S3 server under the Apache 2.0 license. If you want to deploy an S3 server for testing or lab use, you should check it out. It is written in node.js and runs in a Docker container. It uses the same S3 interface code as RING, therefore you can use it for testing compatibility with RING if you choose.

You can download the S3 server here.

My Thoughts

True SDS and object-based storage are still working to gain a foothold in most enterprise environments. They exist right now in special cases, but that is starting to change. We are regularly having discussions with our clients around how they can leverage these technologies in other environments as they mature.  

Scality isn’t a new player in this space but it faces very stiff competition from the incumbents such as Dell EMC, Hitachi, IBM, Red Hat, and others. It must also differentiate itself from other open source initiatives that organizations may look at when considering building out a storage platform such as this.  

Clients often request software-only packages but it’s been interesting to watch the adoption rate. Usually that software-only offering request turns into an appliance purchase, when available. VMware’s VSAN is a great example of this. Most of our clients deploy VxRail or ready-nodes when it’s all said and done. They do not buy licenses for VSAN and then spec and put together their own systems. That’s usually the right decision as it cuts down on risk. Given that, I’d like to see a stronger push for OEM relationships and packages sold and supported by Scality or the OEM. That will satisfy most customer demands and help alleviate the real or perceived risk of building their own.

The bottom line is that this segment is rapidly maturing and use cases are starting to appear in more than just the niche industries. But, often this technology is early enough that you need to properly evaluate it for your use case. A solid proof-of-concept is a good idea since the devil can be in the details with any emerging technology.


For more information about software-defined storage, data management, or storage strategy, contact us today to meet with our experts one-on-one and for more information on our current offerings and solutions.



Author: Jason Nash
Jason Nash is Solution Principal at AHEAD where he focuses on emerging technologies in the AHEAD Lab.

Leave a Reply