Nested Virtualization with KVM Intel

Some context: In regular virtualization, your physical linux host is the hypervisor, and runs multiple operating systems. Nested Virtualization let’s you run a guest inside a regular guest(essentially a Guest hypervisor).For AMD there is nested-support available since a while, and some people reported success w/ nesting KVM guests. For Intel arch., there is support available recently, an year-ish, and some in progress work, so thought I’d give it a whirl when Adam Young started discussion about it in context of openstack project.

Some of the common use-cases for that are being discussed for nested-virtualization
- For instance, a cloud user gets a beefy, Regualar Guest(which she completely controls). Now, this user can turn regular guest into a hypervisor, and can cheerfully run/manage multiple guests for developing or testing w/o the hassle and intervention of the cloud provider.
- Possibility of having a many instances of virtualization setup (hypervisor and its guests) on one single Bare metal.
- Ability to debug and test hypervisor software

I have immediate access to a moderately beefy Intel hardware, and rest of the post is based on Intel’s CPU virt extensions. Before proceeding, let’s settle on some terminology for clarity:

  • Physical Host (Host hypervisor/Bare metal)
    • Config: Intel(R) Xeon(R) CPU(4 cores/socket); 10GB Memory; CPU Freq – 2GHz; Running latest Fedora-16(Minimal foot-print, @core only with Virt pkgs;x86_64; kernel-3.1.8-2.fc16.x86_64
  • Regualr Guest (Or Guest Hypervisor)
    • Config: 4GB Memory; 4vCPU; 20GB Raw disk image with cache =’none’ to have decent I/O; Minimal, @core F16; And same virt-packages as Physical Host; x86_64
  • Nested Guest (Guest installed inside the Regular Guest)
    • Config: 2GB Memory; 1vCPU; Minimal(@core only) F16; x86_64

Enabling Nesting on the Physical Host

Node Info of the Physical Host.

 
# virsh nodeinfo
CPU model:           x86_64
CPU(s):              4
CPU frequency:       1994 MHz
CPU socket(s):       1
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         10242864 kB

Let us first ensure kvm_intel kernel module has nesting enabled. By default, it’s disabled for Intel arch[ but enabled for AMD -- SVM (secure virtual machine) extensions arch.]

 
# modinfo kvm_intel | grep -i nested
parm:           nested:bool
# 

And, we need to pass this kvm-intel.nested=1 on kernel commandline while rebooting the host to enable nesting for the Intel KVM kernel module. Which can be verified after boot by doing:

 
# cat /sys/module/kvm_intel/parameters/nested 
Y
# systool -m kvm_intel -v   | grep -i nested
    nested              = "Y"
# 

Or alternatively, Adam Young identified that nesting can be enabled by adding this directive options kvm-intel nested=y to the end of /etc/modprobe.d/dist.conf file and reboot the host so it persists.

Set up the Regular Guest(or Guest hypervisor)
Install a regular guest using virt-install or oz tool or any other preferred way. I made a quick script here. And ensure to have cache=’none’ in the disk attribute of the Guest Hypervisor’s xml file. (observation: Install via virt-install tool didn’t seem have this option picked by default.) Here is the ‘drive’ attribute libvirt xml snippet:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/regular-guest.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

Now, let’s try to enable Intel VMX(Virtual Machine Extensions) in the regular guest’s CPU. We can do it by running the below on the Physical host(aka Host Hypervisor), and adding the ‘cpu’ attribute to the regular-guest’s libvirt xml file, and start the guest.

# virsh  capabilities | virsh cpu-baseline /dev/stdin 
<cpu match='exact'>
  <model>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='vme'/>
</cpu>

The o/p of the above cmd has a variety of options. Since we need only vmx extensions, I tried the simple way by adding to the regular-guest’s libvirt xml(virsh edit ..) and started it.

<cpu match='exact'>
  <model>core2duo</model>
 <feature policy='require' name='vmx'/>
</cpu>

Thanks to Jiri Denemark for the above hint. Also note that, there is a very detailed and informative post from Dan P Berrange on host/guest CPU models in libvirt.

As we enabled vmx in the guest-hypervisor, let’s confirm that vmx is exposed in the emulated CPU by ensuring qemu-kvm is invoked with -cpu core2duo,+vmx :


[root@physical-host ~]# ps -ef | grep qemu-kvm
qemu     17102     1  4 22:29 ?        00:00:34 /usr/bin/qemu-kvm -S -M pc-0.14 
-cpu core2duo,+vmx -enable-kvm -m 3072
-smp 3,sockets=3,cores=1,threads=1 -name f16test1 
-uuid f6219dbd-f515-f3c8-a7e8-832b99a24b5d -nographic -nodefconfig 
-nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f16test1.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-drive file=/export/vmimgs/f16test1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=21,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:cc:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Now, let’s attempt to create a nested guest

Here comes the more interesting part, the nested-guest config. will be 2G RAM; 1vcpu; 8GB virtual disk. And let’s invoke a virt-install cmdline with a minimal kickstart install:


[root@regular-guest ~]# virt-install --connect=qemu:///system \
    --network=bridge:virbr0 \
    --initrd-inject=/root/fed.ks \
   --extra-args=ks=file:/fed.ks console=tty0 console=ttyS0,115200 serial rd_NO_PLYMOUTH \
    --name=nested-guest --disk path=/var/lib/libvirt/images/nested-guest.img,size=6 \
    --ram 2048 \
    --vcpus=1 \
    --check-cpu \
    --hvm \
    --location=http://download.foo.bar.com/pub/fedora/linux/releases/16/Fedora/x86_64/os/
    --nographics

Starting install...
Retrieving file .treeinfo...                                                                                                 | 1.7 kB     00:00 ... 
Retrieving file vmlinuz...                                                                                                   | 7.9 MB     00:08 ... 
Retrieving file initrd.img...                               28% [==============                                   ] 647 kB/s |  38 MB     02:25 ETA 

virt-install proceeds fine(to a certain extent), doing all regular things like getting access to network, create devices, create file-systems, dep checks performed, and finally package install proceeds:


Welcome to Fedora for x86_64



     ┌─────────────────────┤ Package Installation ├──────────────────────┐
     │                                                                   │
     │                                                                   │
     │                                 24%                               │
     │                                                                   │
     │                   Packages completed: 52 of 390                   │
     │                                                                   │
     │ Installing glibc-common-2.14.90-14.x86_64 (112 MB)                │
     │ Common binaries and locale data for glibc                         │
     │                                                                   │
     │                                                                   │
     │                                                                   │
     └───────────────────────────────────────────────────────────────────┘

And now, it’s stuck like that for ever. Doesn’t budge, trying to install pkgs for eternity. Let’s try to see what’s the state of the guest in a seperate terminal


[root@regular-guest ~]# virsh list
 Id Name                 State
----------------------------------
  1 nested-guest         paused

[root@regular-guest ~]# 
[root@regular-guest ~]#  virsh domstate nested-guest --reason
paused (unknown)

[root@regular-guest ~]# 

So our nested-guest seems to be paused, And package install on the nested-guest’s serial console is still hung. I gave up at this point. Need to try if I can get any helpful info w/ virt-dmesg tool aor any other ways to debug this further.

Just to note, there is enough disk space and memory on the ‘regular-guest’, so that case is ruled out here. And, I tried to destroy the broken nested-guest, and attempted to create a fresh one(repeated twice). Still no dice.

So not much luck yet with Intel arch, I’d have to try on an AMD machine.

UPDATE(on Intel arch): After trying a couple of times, I was finally able to ssh to the nested guest, but, after a reboot, the nested-guest loses the IP rendering it inaccessible.(Info: the regular-guest has a bridged IP, and nested-guest has a NATed IP) . And I couldn’t login via serial-console, as it’s broken due to a regression(which has a workaround). Also, refer to comments below for further discussion on NATed networking caveats.
UPDATE2: The correct syntax to be added to /etc/modprobe.conf/dist.conf is options kvm-intel nested=y

About these ads

14 Comments

Filed under Uncategorized

14 responses to “Nested Virtualization with KVM Intel

  1. Jason

    Try updating your system firmware and disabling VT for Direct I/O Access if you are able in the firmware. One or both of the above made it possible for me to go from kernel panics on the bare metal host to getting RHEL 6.2 on RHEL 6.2 on Fedora 16 installed.

    Used a similar CPU in a system with only 2GB RAM:
    model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz

    • Interesting. As I recall, ‘VT for Direct I/O Access’ is enabled. Wondering how does turning it off can alleviate the prob., but I’ll try that. . The server is remote, and I can’t modify the BIOS settings remotely as of now. Need to configure it.

      Also, an update which I yet have to post is: After a couple of tries the other day, I re-created a fresh nested-guest and let it run. After a day or so, I checked on the regular-guest to see status of the nested-guest. Interestingly, the status seems to be ‘running’. And, I tried a ‘virt-cat’ to get the (NAT’ed)IP of the nested-guest. So, I got an IP, ssh(from regular-guest), installed ‘virt-what’ pkg and ran a few cmds. Things were fine. I tried to see if rebooting the nested-guest still comes up. Turns out nope. Now I just can’t ssh nor ping. I tried reloading/restarting libvirt , restart/stop IP tables, on ‘regular-guest’ to no avail. Also ensured the below (on the regular-guest)

      
      # cat /proc/sys/net/ipv4/ip_forward
      1
      # cat /proc/sys/net/bridge/bridge-nf-call-arptables
      1
      # cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
      1
      # cat /proc/sys/net/bridge/bridge-nf-call-iptables
      1
      
      • Jason

        Disabling Direct I/O may or may not have done it; I updated the system firmware and disabled that in one step, so it is possible either fixed my problems and the other made no difference; I was sort of short on time so I was willing to try multiple things in one pass.

        One thing to be careful of with NAT’d guests is that the bare metal host by default will have a 192.168.122.0/24 network on the virbr0 interface. When you set up the 1st level kvm host and set up libvirt you’ll end up with the same thing all over again unless you change it manually; you’ll have two different 92.168.122.0/24 networks on that host which will not work too well.

        I created a bridge on the bare metal host and the first level vm so that the nested vm would still have access to the network without resorting to (multiple levels of) NAT.

        I ended up just creating a br0 bridge so that I did not have to worry about that.

      • Jason, thanks for the reply. Got your point w/ Directed I/O and system firmware.

        Your comment is spot on w/ the problem of multiple-levels of NATed guests. I hit the exact same prob. while testing w/ AMD. (and later fixed it by the bridge). I was trying to fix this problem by creating a new ‘persistent’ libvirt network(and enabling autostart) w/ a different NAT IP space than the default, and then rebooting the regular-guest. It wasn’t quite elegant and looked a little messy. So I destroyed this new persistent libvirt. network and went the bridge route on the phyisical host.

        But in the previous comment, my Intel setup was exactly like yours (have a bridge on the bare-metal host). and the the regular-guest(’1st level of kvm host’) *did* have bridged IP, and the nested-guest has access to NAT (w/o having to resort to multiple levels of it).

        PS: Also, yesterday I tested nesting w/ AMD successfully (w/ a bridge on the physical-host). I’ll post my notes soon.

  2. Pingback: Nested Virtualization with KVM and AMD | Kashyap Chamarthy

  3. Terry Yip

    Hi, I am a newbie in KVM. Does CentOS 6.2 support nested virtualization? When I load the intel-kvm module with nested=1 parameter, it fails to load the module. If I take out the nested parameter, the module can be loaded successfully.
    # modprobe kvm_intel nested=1
    FATAL: Error inserting kvm_intel (/lib/modules/2.6.32-220.23.1.el6.x86_64/kernel/arch/x86/kvm/kvm-intel.ko): Unknown symbol in module, or unknown parameter (see dmesg)

  4. IT WORKED !!! THX MAN, SUPER AWESOME. (fc17 and i7-3930K Hexa-Core)

  5. Hi, any progress here? Did you manage to power up the virtual virtual machine without hanging?

  6. Siddhesh

    Hi thanks for this blog.It is really helpful.
    I have few question regarding paging in nested virtualization.

    I was reading turtles project paper and they have introduced multi-dimensional paging in which L0 exposes EPT capabilities to L1. Does this mean that if I do cat /sys/module/kvm_intel/parameters/ept in L1 guest,it should give Y? Because i tried to do so and the result of cat /sys/module/kvm_intel/parameters/nested is N.

    Am I missing something here?

  7. Sunil

    Kashyap, thanks for sharing this!
    Correction:
    “kvm_intel.nested=1 to the end of /etc/modprobe.d/dist.conf file and reboot the host so it persists.”
    You need to add “options kvm-intel nested=y” instead

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s