Alex Miroshnichenko, CTO

Virsto and Cloning (second in a series)

Tags: dedupe, excessive storage spending, no dupe, storage sprawl, Virsto One

In my prior post, we started to discuss clones. What is a clone, anyway?

It depends on the situation and the nature of objects being cloned. For example, if you could Imperfect clonesclone the material contents of a metal box you could be considered for a Nobel prize. (Someone call Stockholm; I think we are cloning the contents of metal boxes…)

If you clone a bank account you are likely to be considered for an extended stay at Club Fed – even if the account belonged to Bernie Madoff.  

And on the subject of world domination by evil geniuses, Dr. Evil and Mini Me are an example of imperfect clones.

But I digress. Let's return to where we left off in the previous post.

It is important to know what we are about to clone and that there are different kinds of clones.

Virtual Disk Images

The objects we are dealing with are virtual disk images for virtual machines. These VM images (in a format Microsoft calls VHD, or virtual hard disk) contain operating system executables and VM-specific configuration data.

An executable for a modern OS is quite large. A bare bones distribution of Windows Server 2008 R2 weighs in at about 7GB of genuine Microsoft data bits. Of those, probably 6.99..9GB are binary executables and the remaining couple of bits store default configuration information. The executables will never change. The instance-specific data will change but never grow to larger than 10-20% of the VM image size. Install your favorite application suites like Office 2010 or SAP Business Suite and the total size of the immutable part of your VHD can easily reach 20GB.

Virtual Disk Cloning: Essential to VM Provisioning

To deploy a large number of virtual servers (or desktops for that matter – that's a teaser for a future blog topic), the natural process is to create a template OS image (or golden image) which contains all operating system and application executables including patches and updates, then clone that image as many times as needed. “System provisioning” consists of server provisioning (CPU and memory resources) and storage provisioning – the process of creating cloned copies of the golden virtual disk image.

When some vendors say "we provision storage" as in the above paragraph, it is a fancy way of saying "we have to copy lots of bits". In the majority of cases that's what it is: а plain, dumb copy.

Why dumb? Well, think about it. A single golden image may be used to create a huge number of derivative images, hundreds even thousands. So almost all of the time and disk space it takes to copy those images are wasted copying identical data that will never change.

Ever heard of "VM storage sprawl"? That is where the term comes from. You do the math how much time and space you waste in your environment by copying things around.

A Better Way to Clone VMs

Enter Virsto One. Virsto One is designed for the ideal virtual disk provisioning workflow described above, but without the nasty side effects typical of alternatives.

A cloning operation in Virsto One never copies the golden image data. It creates a virtual disk object, an exact clone of the original. The clone looks, feels, and behaves as if it were a full fidelity copy of the golden image. As far as the hypervisor (or anything else) can tell, it looks like Virsto made one of those dumb copies.

Abundant clonesBut in fact, we didn't. We created a VHD object that is addressable and usable just like any other VHD, but we didn't copy a single bit of data from the golden image to the clone. All the blocks are shared between the two VHDs. So 50% of the disk space that otherwise would have been taken up by the clone has been saved. Make a third and then a fourth clone. That's a 75% space savings. Twenty clones? 95% savings.

Clones in Action

So much for the initial state of the clone. What happens when data is written to either the golden image or to one of the clones?

As a block of disk space that is shared by more than one VHD is being written to, that block can no longer be shared.  The VHD that's being written to needs its own private, non shared, unique version of that block.  Virsto One allocates private storage space for that VHD's version of the block.

Unchanged blocks remain shared among all derivatives and do not consume any extra storage. Only truly unique data blocks occupy physical disk space.

I should note that Virsto clones, unlike other types of clones, have some pretty powerful characteristics. For example:

  • Clone creation is practically instantaneous. You ask for a Virsto clone, and you've got it. Right now.
  • Clones have the same performance signature as golden images. I/O to a Virsto clone is just as fast as I/O would have been to one of those dumb copies. This is true not only just after clone creation, but even after the clone has lived for a long time.
  • Clones are instantly available on all nodes in a cluster.  You can make a clone on server box A and instantly fire it up as a virtual machine on server B.
  • Clones (and in fact golden images) are always thin provisioned. Virsto VHDs don't occupy disk space until a block is actually written to. This is similar to Microsoft dynamic VHDs, but remember I/O performance to a Virsto VHD is full throttle, unlike other kinds of thin provisioning.
  • There is no limit to the number of clones that can be made from one VHD. Alternatives have practical limits of tens, maybe hundreds. At Virsto, we think thousands isn't that large a number.
  • Clones can be cloned. And clones of clones can be cloned. Et cetera. Arbirtarily complex trees of clones can be created, with Virsto One presenting each clone as a simple linear VHD easily consumed and managed like any other standard Microsoft VHD.

By the way, all this is completely transparent, invisible to guest VMs. So apps you run in these VMs are completely unaffected.

Not a Cow

When we were raising initial venture capital for Virsto, when I reached this point in a presentation to an investor, it was not unusual for an associate in a VC firm to proudly say something smart like "So, this is just a copy-on-write scheme, right?"

No CoWNo, it is not. Copy-on-write (CoW) is just one of the implementation algorithms which may exhibit some of the desired properties. However, CoW has a lot of issues with performance, scalability, garbage collection, et cetera. These issues make CoW a poor choice for VM disk images.

If I had to express our technique in a simple catchy phrase I would call it "logging with allocate on log flush". Ok, you might find that neither simple nor catchy but it's my best attempt.

Many people find "flush" to have a bad connotation when applied to valuable data, so we usually use the word "destage" instead. But the essence is the same. Virsto One allocates permanent storage at the time of moving data from the log device to its permanent location. (There is a lot to our logging technique that goes beyond the issues we're discussing here, but we'll leave that to future blog posts.)

Perhaps it is best for now to avoid all the details and answer the question by saying, "No, we don't do copy-on-write. We have secret sauce algorithms derived from our combined decades of experience in the storage and virtualization industries and academia, and we've made that secret sauce specifically for the unique needs of virtual data centers."

Whew, We Covered a Lot of Ground

The good news for you is that the Virsto team has done all the heavy lifting. You don't need to know how we work our magic. All you need to know is that we do make it easy to make lots of super high performance, space saving VM clones, without buying a single piece of new storage or server hardware.

You can draw your own conclusions on how well we have implemented these goals. Just download a trial version Virsto One, and let us know what you think. It's only 6.8MB in size and installs in seconds. 

If you've read about or perhaps tried dedupe, you may be wondering how Virsto is different, if at all. In the third and final part of this series, I will contrast Virsto One "no dupe" with "dedupe" alternatives.

Leave a Comment

Name (required)

Email (will not be published) (required)

Website

Remember my personal information

Notify me of follow-up comments?

Please enter the word you see in the image below: