This morning during his keynote address at VMworld Mendel Rosenblum talked about (and demoed) a new virtual machine monitor capability we've been playing with in VMware R&D, called record/replay. The basic idea is to be able to record the instruction stream that a virtual machine executes and be able to reproduce it exactly at a later time. This isn't just the instructions associated with a single application process or thread; it includes all code executed within a VM, including multiple processes, kernel code, and interrupt handlers. The replay can even run faster than the original execution (if desired), since during replay the host can skip over idle time.
What can you do with this technology? Well, one obvious use is debugging. A common problem in the development of operating system kernels and complex applications is non-reproducible bugs - a bug that happens due to a specific combination of asynchronous events and can't be readily replicated. Often these are due to races due to incorrect locking or other timing related problems. Even if the developer is lucky enough to get a core file containing a memory image of the system at the time the bug is detected (or shortly thereafter), the detection often occurs far enough after the initial problem that it's difficult to tell what happened. There's also the problem that the act of creating a core file (particularly of an OS kernel) can distort the contents. Personally as a kernel developer I've spent many hours staring at object code and remnants of register and kernel memory state and trying to deduce why a problem occurred, wishing I had a time machine that would allow me to back up and see the state of a given register before it was clobbered by unrelated code, or figure out what thread scribbled garbage onto a critical data structure.
With record/replay, you have the ability to exactly replay the execution of instructions in such a way that you can move forward and backward in time, and examine memory and register state at different points. In addition to aiding in manual debugging, it enables wider use of tools that automatically detect bugs based on the instruction stream and changes to memory state. Such tools are normally not feasible for use in production or even general QA, since they cause a substantial slowdown which reduces performance and can change timing and drive away bugs. But with the ability to replay execution, we can do heavy-duty processing and analysis after the bug has already occurred, when performance is less of a concern. The analysis can even be done on another system, perhaps the developer's machine rather than the machine dedicated to QA or production use.
You can also probably think of other uses for this technology - one that comes to mind is keeping a log of execution for analyzing security attacks.
So how do we do this? Moreover, how do we do this efficiently? Obviously, we could record the VM's instruction stream by trapping every instruction and recording the PCs - then on replay, walking through the instruction trace and single-stepping or emulating each instruction. That would be extremely slow, though; the CPU would spend most of its time trapping into the virtual machine monitor rather than executing the applications running in the virtual machine. It's similar to what classical instruction emulators do, which often have performance slowdowns of 100x or more. Clearly this approach wouldn't be viable for real application use.
The answer is to think about what affects the stream of instructions that are executed by an operating system (and the applications running within it). Most of the time, the CPU simply executes a deterministic series of instructions - the instruction that will be executed next is determined solely by the previous instruction executed along with (in the case of a conditional branch) the current state of processor registers and memory. If this was the only thing determining execution order, we could replay an instruction stream by simply starting with the same register and memory state (including the current PC), and starting execution.
Execution isn't always deterministic, of course. The source of non-determinism is I/O, particularly interrupts (including timer interrupts), I/O port accesses, and data copied into memory via DMA. You can view these as external inputs that influence the execution of a virtual machine (or a physical one, for that matter). If we can keep track of these external inputs, we can record the information needed to reproduce a VM's execution without having to record a complete instruction trace.
The problem of recording the execution of a VM for exact replay then becomes one of logging these external inputs and the times (relative to the execution of instructions within the VM) when they occur, and (on the replay side) synchronizing the execution with emulation of the inputs. As an example, think about the effect on a VM of receiving a network packet. There are two external inputs: the packet is copied into the VM's memory, and an interrupt is raised to notify the VM that there is new data to process. (I'm glossing over minor details like changes to ring buffer registers here.) While recording, we need to log the contents of the packet, the time the data is copied into memory, and the time the interrupt was raised (which may be the same). While replaying, we need ensure that these inputs are made visible to the VM at the exact same point in the VM's execution as when recording. In between these synchronization points, the VM can execute normally - meaning that user level code can execute at full speed on the processor. That's the key to being able to record and replay execution efficiently.
All of this is obviously focused on uniprocessor VMs - record/replay for SMP VMs is a more difficult problem. And I'm glossing over a number of implementation details. But it gives an idea of what's possible by interposing at the virtual machine level.