Alex Miroshnichenko, CTO

Virsto No Dupe (third in a series)

Tags: dedupe, no dupe, storage sprawl, Virsto One

Now that you've had ample time to memorize my previous post on cloning, let's wrap up this series by comparing deduplication (dedupe) with Virsto no dupe.

First, a refresher of how we defined data deduplication from my first post in this series: dedupe refers to any technology that analyses data content for the purpose of finding redundancies for subsequent reduction of required space in storage or bandwidth for transmission.

Current dedupe products

The marketplace has data deduplication offerings in software and hardware form. And, as the marketing literature tells me, each of these solution offerings solve world peace. Uh oh, I am having déjà vu. Remember the movie War Games? Remember WOPR and its supercomputing goal of predicting war outcomes? At the end (sorry to spoil it if you still haven't seen this movie from 1983) WOPR concludes the only winning move is not to play. WOPR then opts for a nice game of chess. (How’s that for foreshadowing?)

Scanning and analyzing massive amounts of data is a very resource intensive operation. A typical workflow involves scanning every data block to calculate some kind of reliable checksum (fingerprint), comparing that fingerprint with its known data set, performing additional calculations and comparisons in case of matches and so on. This is extremely hard to do with a fast data stream in real time, precisely because the algorithmic execution and required processing time for a given data set depend on its content. Numerous PhD dissertations have been written on the subject (some of them even defended). And of course, vendor companies have been created, bought and sold to pursue the opportunity. Some of these companies have implemented their algorithmic magic in a dedicated custom designed piece of hardware. Many of these custom hardware pieces are fairly expensive storage devices whose application is data backup and archiving, not primary data storage. 

With content-aware data dedupe, like with most resource hungry technologies, we have the option (if not the pragmatic requirement) to delay or schedule processing and perform it in the background when extra bandwidth and CPU cycles are available. This is a valid approach used by many for backup and archiving applications.

There are no examples of – please pay attention to the qualifier – inline software dedupe technologies for online transactional data storage which have been applied on any significant scale. The required processing speeds and memory sizes are simply beyond the current state of the art.

Virsto One clones: optimal for VM storage sprawl

We made a strategic decision for dramatically reducing storage consumption in our initial technology deployment. In my second blog post of the series, I went through a broad list of Virsto One clone characteristics and their benefits. The bottom line is, we believe that any new functionality required by virtualized data centers should not come at the expense of degrading any existing features. In particular, the performance of the online storage subsystem should never be sacrificed in order to support the use of data deduplication.

No DupeScalable cloning technology purpose-built for VM deployment is ideal because it deals directly with the primary source of data duplication in virtual machines, namely the tragic waste of resources caused by the constant provisioning of new VM disk images that is fundamental in running a virtual infrastructure. And because dedupe techniques are so resource intensive as to be prohibitive for real time use in production VMs, it doesn't make sense to use the dedupe hammer as the primary tool to drive the nail of VM storage sprawl.

It has been noted numerous times, including in comments about this blog series, that cloning technology does not solve the problem of duplicate data introduced into independent data sets post-provisioning. However, the pain of storage sprawl and excruciatingly slow VM storage provisioning in the virtualized world is acute and immediate. Virsto One clones solve these problems today in a highly economical way, and have beneficial characteristics that no other cloning technology possesses.

No single technology is perfect for all occasions, and great products ultimately combine multiple technologies to solve customer problems. The best products are designed from the start to be extensible and future-proof, and you should be assured that the Virsto One architecture has a natural and elegant way to add data deduplication features. Virsto One is a great product today with a future proof design. I encourage you to try it now.

ChessAnd WOPR if you are still online, I am up for a nice game of chess. As an opening move, in the near future I'll write about hardware versus software approaches to storage virtualization.

Leave a Comment

Name (required)

Email (will not be published) (required)

Website

Remember my personal information

Notify me of follow-up comments?

Please enter the word you see in the image below: