Back to home page

Quest Cross Reference

 
 

    


0001 \documentclass[twocolumn]{article}
0002 \usepackage{url}
0003 
0004 \newcommand\func[1]{{\path {#1}}}
0005 \newcommand\type[1]{{\path {#1}}}
0006 \newcommand\optn[1]{{\path {#1}}}
0007 \newcommand\defn[1]{{\path {#1}}}
0008 \newcommand\MSR[1]{{\path {#1}}}
0009 \newcommand\bit[2]{{\tt {#1}.{#2}}}
0010 \newcommand\inst[1]{{\tt {#1}}}
0011 \newcommand\reg[1]{{\tt {\%{#1}}}}
0012 
0013 \begin{document}
0014 
0015 \section{Introduction}
0016 The Quest kernel contains limited support for utilizing the primitives
0017 of Intel's hardware virtualization technology (VT) scheme, called VMX.
0018 The module is implemented in the \path{vm/vmx.c} file and is
0019 accompanied by a virtual-8086 monitor in \path{vm/vm86.c} for the
0020 emulation of real-mode code.  The specification for VMX is described
0021 in Intel's System Programmer's Manual 3B.  The instructions associated
0022 with VMX are described in Intel's Instruction Set Reference 2B.  Many
0023 of the definitions and constants introduced in those manuals are
0024 imported into Quest header files under the \path{include/vm/} folder.
0025 
0026 \subsection{Configuration}
0027 The VMX module may be enabled via the configuration option
0028 \optn{USE_VMX} in the \path{config.mk} file.
0029 
0030 \subsection{Design}
0031 
0032 \section{Initialization}
0033 \subsection{Enabling VMX}
0034 When enabled, the initialization of the VMX module takes place during
0035 normal module initialization.  Application Processors also perform
0036 per-CPU initialization of virtual machine code after the boot process
0037 has completed but prior to scheduler invocation.  The function
0038 \func{vmx_global_init} is called once, and \func{vmx_processor_init}
0039 is invoked once per CPU.  Each processor must perform hardware
0040 initialization in order to use VMX. VMX has certain stringent
0041 requirements on the state of the processor and is very strict about
0042 checking.  These requirements are meant to be obtained by reading a
0043 number of MSRs which describe the necessary and possible settings of
0044 various control registers.  Most implementations require \bit{CR0}{NE}
0045 be enabled, so that is done early on explicitly.  \bit{CR4}{VMX} is
0046 toggled to enable the use of VMX instructions.  A page must be
0047 designated and reserved as the \inst{vmxon} area.  After that basic
0048 bit of initialization is complete, the module goes directly into the
0049 initialization for the isolation mechanism that is being designed for
0050 Quest.
0051 
0052 \subsection{Creating a virtual machine}
0053 \subsection{Memory}
0054 The original VMX module created VMs that operated in real-mode at the
0055 beginning.  However, for the isolation mechanism, there is no need to
0056 ever enter real-mode.  In either case, a virtual machine control
0057 structure (VMCS) must be allocated and reserved for use by the
0058 processor.  A VMCS is a single page which is not intended to be
0059 accessed directly by software, with one exception: the first 64-bits
0060 must contain the zero-extended value of the MSR \MSR{IA32_VMX_BASIC}.
0061 The physical address of the VMCS may be used with the \inst{vmclear}
0062 and \inst{vmptrld} instructions to prepare and load a VMCS into the
0063 processor.  Only a single VMCS may be loaded on a processor at a time.
0064 In order to move a VMCS from one processor to another, it is necessary
0065 to unload it and then load it onto the new processor.  Anytime a VMCS
0066 is freshly loaded onto a processor, the \inst{vmlaunch} instruction
0067 must be used in order to enter the VM.  The \inst{vmresume}
0068 instruction can only be used to re-enter an already-launched VM.
0069 
0070 In addition to the processor-specific information stored in the VMCS,
0071 Quest has its own \type{virtual_machine} structure.  This stores the
0072 physical address of the VMCS as well as some other useful state
0073 information, including: whether the VM is in real-mode, whether the VM
0074 has been ``launched,'' and the state of the general-purpose registers
0075 in the VM.
0076 
0077 \subsection{VMCS fields}
0078 Once all memory has been allocated and the VM is loaded onto the
0079 processor, it is possible to use the \inst{vmread} and \inst{vmwrite}
0080 instructions to configure the various parameters of the virtual
0081 machine.  There are an extensive number of these parameters, and a
0082 full description can be found in the manual.  Roughly speaking, there
0083 are parameters for: host state, guest state, event injection, and
0084 VM-exit triggers.  All of the fields are given an index which can be
0085 used with \inst{vmread} and \inst{vmwrite}.  In Quest these indices
0086 are defined by constants prefixed by \defn{VMXENC_}.  While many of
0087 the fields are 64-bit in size, we are only operating and emulating
0088 32-bit mode, therefore we only deal with the least significant bits of
0089 the fields.
0090 
0091 \subsection{Guest state}
0092 In Quest, we assume that the current state of the CPU is the starting
0093 point for the virtual machine.  Therefore, the code proceeds to read
0094 the various flag, control, and segment registers from the machine and
0095 write those values into the guest fields.  VMX also requires that
0096 certain ``hidden'' processor state be initialized in the VMCS guest
0097 fields.  This includes the base, limit, and access rights for segment
0098 selectors: things that are normally loaded from the GDT by the
0099 processor, but must be manually loaded into fields here.
0100 
0101 The general-purpose registers are not stored in the VMCS.  However,
0102 the registers \reg{rip} (\reg{eip}) and \reg{rsp} (\reg{esp}) are
0103 stored and loaded from the VMCS.  In addition, the
0104 \inst{sysenter}-related MSRs are treated this way as well.  There is
0105 also some additional processor-internal state which is exposed via the
0106 \defn{ACTIVITY} and \defn{INTERRUPTIBILITY} fields.
0107 
0108 \subsection{Exit reasons}
0109 The VMCS can also be configured with flags that describe the
0110 conditions under which the virtual machine exits and returns to
0111 ``root'' operation.  For example, there is a 32-bit \defn{EXCEPTION}
0112 bitmap indexed by exception number where a 1-setting indicates that
0113 the given exception should cause a VM exit.  The page fault exception
0114 also has a few other fields for more specific behavior.
0115 
0116 Similarly, there is a flag which controls whether external interrupts
0117 cause VM exits, and one for the NMI.  These are part of the
0118 ``pin-based'' controls.  The ``processor-based'' controls are flags
0119 for whether certain aspects of processor behavior should cause VM
0120 exits.  For example, we disable VM exits for the access of \reg{cr3}
0121 as well as instructions \inst{rdtsc} and \inst{rdpmc}.  VM exits for
0122 MSR access can be controlled through the MSR-bitmap, which is also
0123 enabled here.  Some of these controls are actually configured in
0124 \func{vmx_start_vm}.
0125 
0126 \section{Entering a virtual machine}
0127 
0128 \subsection{The initial instruction pointer}
0129 The function \func{vmx_enter_pmode_VM} saves a value of \reg{eip} with
0130 a bit of inline assembly which \inst{call}s a label and pops the
0131 stored value of \reg{eip} off the stack.  The program point of the
0132 stored \reg{eip} corresponds to the point in the inline assembly after
0133 the return of the \inst{call}.  Therefore, when the VM is launched,
0134 this path of execution can be distinguished by clearing the register
0135 storing the value of \reg{eip}.  In addition to saving \reg{eip}, it
0136 is important to fork a stack at this point.  The VM will carry on
0137 using the original stack, but the hypervisor must copy the stack and
0138 have a separate one.  This ensures that there will be no interference
0139 between the two.  Since the VM will begin operation inside this
0140 function, the state of the stack up to this point is preserved; the
0141 hypervisor's stack is put into effect immediately by carefully
0142 assigning a new, functionally equivalent, value for \reg{esp} in the
0143 new stack.  The VM is prepared with its initial values of \reg{eip}
0144 and \reg{esp} before making the call to \func{vmx_start_VM}.
0145 
0146 \subsection{Entry and preparations for eventual exit}
0147 The function \func{vmx_start_VM} must prepare for VM entry by saving
0148 the state of the host currently.  This is a similar process to the
0149 preparation of guest state as described previously, except that there
0150 are fewer fields to fill.  General-purpose registers are not managed
0151 by VMX, therefore it is necessary to save host registers and restore
0152 guest registers.  The host registers are stored on the host stack
0153 using \inst{pusha}.  Again, we need to fork paths and save a value of
0154 \reg{eip} to store in the VMCS field \defn{HOST_RIP}.  The guest
0155 registers are loaded from memory by copying them onto the stack in
0156 such a way that \inst{popa} can restore them all at once; just prior
0157 to the usage of either \inst{vmlaunch} or \inst{vmresume}.
0158 
0159 \subsection{Entry failure}
0160 The specification defines a series of conditions that must be
0161 fulfilled for successful VM entry.  There are a number of failure
0162 modes depending on how far the process gets before aborting.  Late
0163 failures can lead to a full VM exit.  Early failures are different,
0164 the processor simply advances to the instruction following the
0165 \inst{vmlaunch} or \inst{vmresume} and sets the zero and/or carry flag
0166 according to various criteria.  However, the registers still contain
0167 the values for the guest.  Therefore, it is necessary to carefully
0168 restore the host registers -- especially \reg{esp} -- while also
0169 checking the flag values.  The usual tactic of using \inst{pushf} does
0170 not work: the value of \reg{esp} is available in the VMCS field
0171 \defn{HOST_RSP} but to use \inst{vmread} would clobber the flags.
0172 Therefore, the flags must be case-analyzed via branching, at which
0173 point the value of \reg{esp} can be restored; followed by the
0174 general-purpose registers which have been saved on the stack.  The
0175 exact cause of the failure can be pin-pointed through certain VMCS
0176 fields which contain error codes to be analyzed.
0177 
0178 \section{Hypervisor}
0179 \subsection{Handling VM exits}
0180 In the case of partially or totally successful VM entry, the machine
0181 enters a state that is designated as ``non-root operation.''  Upon VM
0182 exit, the machine transfers control to the value stored in the VMCS
0183 field \defn{HOST_RIP}, and switches to the stack value saved in
0184 \defn{HOST_RSP}.  Because guest general-purpose registers are not
0185 handled by VMX, the very first instruction we use is \inst{pusha} to
0186 snapshot them on the host stack.  In order to continue operation and
0187 save the guest registers into the \type{virtual_machine} structure, we
0188 must restore the host registers.  We know where they are on the stack
0189 -- adjacent to the snapshot of the guest registers.  Therefore, a
0190 quick trick suffices: add 32 to \reg{esp}, \inst{popa} the host
0191 registers, then load \reg{esp-64} into \reg{esi} and quickly copy the
0192 stack values into the structure (the pointer has already been stored
0193 into \reg{edi}).  The stack is then restored to the original point and
0194 the host registers are once again popped, to restore any that were
0195 clobbered by the operation thus far.
0196 
0197 Once again, a number of VMCS fields have been prepared with codes that
0198 explain the cause of the VM exit.  The hypervisor then case-analyzes
0199 those codes in order to determine how to proceed.  For example,
0200 \inst{cpuid} causes an unconditional VM exit.  Therefore, it is
0201 necessary for the hypervisor to emulate the instruction.  It
0202 manipulates the guest registers in the saved structure, and then
0203 re-enters the VM at the instruction following \inst{cpuid}.
0204 Similarly, when emulating real-mode, the use of virtual-8086 means
0205 that general-protection faults are, in fact, requests for intervention
0206 by the monitor.  Therefore, the function \func{vmx_vm86_handle_GPF} is
0207 invoked to analyze and emulate any instruction that requires such
0208 assistance.
0209 
0210 \end{document}