I'm a Principal Engineer at VMware , and am part of the team working on the ESX Server product. We're pretty excited about the recent announcement of ESX 3, which has been in development for quite a while. I actually can't take much credit since I only came in on the tail end of the ESX 3 development - up until last May I was at Sun, working on Solaris and OpenSolaris (see my Sun blog for more details).
Recently, I've been working on Distributed Resource Scheduling (DRS), which will be available with ESX 3. The basic idea behind DRS is to provide the ability to automatically schedule VMs across a cluster of machines, in much the same way that an operating system schedules processes on different CPUs. In addition to determining where VMs should run when initially powered on, DRS uses hot migration (aka VMotion) of VMs between hosts to adapt to dynamic changes in load or available resources.
The thing I find interesting about DRS is the way it decouples the application infrastructure (including the guest operating system itself) from the physical hardware. If you want to take a machine down for maintenance, you can put it in maintenance mode and the VMs it was running will automatically migrate to other systems. Once you're done, power it back on and VMs will migrate back. If you have a spike in the load on a particular application, the scheduler can compensate by moving that VM to a host with more available resources. If you decide that a particular VM is more important than you initially thought and want to give it more resources, you can change the resource settings on the fly and the scheduler will adjust.
Of course, operating systems have been doing this sort of thing for years. When I run a multithreaded application on an SMP I don't need to worry about how the threads are scheduled onto processors - the OS takes care of that. And various kinds of batch and grid schedulers have been able to do initial placement scheduling - deciding what machine a new job should run on. But being able adapt on the fly to changes in load and available resources - migrating workloads between independent machines - without application or OS changes - using commodity (i.e. cheap) hardware - in an enterprise-class product - that's something different.
We use ESX server running 45 VMs on 2 FSC RX800 boxes. Recently we bought a new server FSC RX600 S2 but since there is something different about the instruction sets in the XEON MP 3.0 GHz processors in these two different boxes VMotion wont work.
This is really a big problem for us since we want to grow by bying a new server, like once a year, and they will most likely have some processor differencies each time.
Will you be able to solve this, in version 3 or later?
Posted by: Håkan Sandström | December 13, 2005 at 10:59 PM
Andy, if it's okay, I wanted to quote you for a University of Vermont IT newsletter article about VMware ESX Server. I specifically want to quote your explaination of DRS. If that's a problem please email me to let me know. Thanks!
Posted by: Stefanie Ploof | November 27, 2006 at 08:41 AM
(explanation, even :) )
Posted by: Stefanie Ploof | November 27, 2006 at 08:42 AM
No problem, Stefanie.
Posted by: atucker | November 27, 2006 at 09:35 AM