I'm a Principal Engineer at VMware , and am part of the team working on the ESX Server product. We're pretty excited about the recent announcement of ESX 3, which has been in development for quite a while. I actually can't take much credit since I only came in on the tail end of the ESX 3 development - up until last May I was at Sun, working on Solaris and OpenSolaris (see my Sun blog for more details).
Recently, I've been working on Distributed Resource Scheduling (DRS), which will be available with ESX 3. The basic idea behind DRS is to provide the ability to automatically schedule VMs across a cluster of machines, in much the same way that an operating system schedules processes on different CPUs. In addition to determining where VMs should run when initially powered on, DRS uses hot migration (aka VMotion) of VMs between hosts to adapt to dynamic changes in load or available resources.
The thing I find interesting about DRS is the way it decouples the application infrastructure (including the guest operating system itself) from the physical hardware. If you want to take a machine down for maintenance, you can put it in maintenance mode and the VMs it was running will automatically migrate to other systems. Once you're done, power it back on and VMs will migrate back. If you have a spike in the load on a particular application, the scheduler can compensate by moving that VM to a host with more available resources. If you decide that a particular VM is more important than you initially thought and want to give it more resources, you can change the resource settings on the fly and the scheduler will adjust.
Of course, operating systems have been doing this sort of thing for years. When I run a multithreaded application on an SMP I don't need to worry about how the threads are scheduled onto processors - the OS takes care of that. And various kinds of batch and grid schedulers have been able to do initial placement scheduling - deciding what machine a new job should run on. But being able adapt on the fly to changes in load and available resources - migrating workloads between independent machines - without application or OS changes - using commodity (i.e. cheap) hardware - in an enterprise-class product - that's something different.