Nested Virtualization with KVM and AMD

After my previous attempt the other day to create a nested-guest(kvm on kvm) with Intel arch, I got hold of an AMD server machine with virt-extensions enabled and gave it a whirl. This went slightly smoother than the Intel attempt.

Some config info about the physical host, regular-guest and nested-guest. (All of them are Fedora-16; x86_64)

  • Physical Host (Host hypervisor/Bare metal)
    • 
      [root@phy-host-amd]# virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              16
      CPU frequency:       2000 MHz
      CPU socket(s):       2
      Core(s) per socket:  8
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         8173352 kB
      
  • Regualr Guest (Or Guest Hypervisor)
    • Config: 4GB Memory; 6 vcpus; 22GB Raw disk image w/ cache=’none’ enabled in the libvirt xml
  • Nested Guest
    • Config: 2GB Memory; 3 vcpus; 10G Raw disk image

Ensure nesting is enabled on the physical host

Let’s ensure kvm_amd kernel module is enabled with ‘nested’ virt.


[root@phy-host-amd ~]# modinfo kvm_amd | grep -i nested
parm:           nested:int
[root@phy-host-amd ~]# 

[root@phy-host-amd ~]# cat /sys/module/kvm_amd/parameters/nested
1
[root@phy-host-amd ~]# 

[root@phy-host-amd ~]# systool -m kvm_amd -v   | grep -i nested
    nested              = "1"
[root@phy-host-amd ~]# 

CAVEAT: To make life a little easier, I configured bridged networking on the physical host to ensure our regular-guest gets a bridged IP; and later, nested-guest gets a NATed IP. I’m noting it here because, the physical host initially had no bridging. The default libvirt bridge virbr0 has 192.168.122.0/24 IP space. So once we set up the regular-guest(or guest-hypervisor), we’ll end up having the same IP space. I tried to fix this prob. by creating another ‘persistent’ libvirt network interface and enabled autostart of it. [virsh net-add; virsh net-define; virsh net-autostart ]. But, it wasn’t elegant and messed up networks on reboot.

Set up the guest hypervisor
Create a minimal regular-guest using virt-install . The one I used is posted here

Now, add the cpu attribute to the regular-guest’s libvirt xml to expose AMD’s svm instructions, which comes with Opteron_G3 model .

Edit the xml using virsh:

# virsh edit regualr-guest 

(which will also define the xml)

Here is the attribute to be added to the guest hypervisor’s libvirt xml:

   <cpu>
      <arch>x86_64</arch>
      <model>Opteron_G3</model>
      <vendor>AMD</vendor>
      <topology sockets='2' cores='8' threads='1'/>
      <feature name='wdt'/>
      <feature name='skinit'/>
      <feature name='osvw'/>
      <feature name='3dnowprefetch'/>
      <feature name='cr8legacy'/>
      <feature name='extapic'/>
      <feature name='cmp_legacy'/>
      <feature name='3dnow'/>
      <feature name='3dnowext'/>
      <feature name='pdpe1gb'/>
      <feature name='fxsr_opt'/>
      <feature name='mmxext'/>
      <feature name='ht'/>
      <feature name='vme'/>
    </cpu>

And, restarted the regular-guest, so that it boots w/ the -cpuflag which the AMD virt extensions:


[root@phy-host-amd ~]# ps -ef | grep -i qemu-kvm
qemu     26677     1 14 10:39 ?        00:00:30 /usr/bin/qemu-kvm -S -M pc-0.14 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-kvm -m 4096 -smp 6,sockets=2,cores=8,threads=1 -name regular-guest -uuid 8f6a4478-496b-51d8-2de2-ff7fdb964af3 -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/regular-guest.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:5f:c6:5f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Now, let’s fetch the IP of the regular-guest using virt-cat


[root@phy-host-amd ~]# virsh list
 Id Name                 State
----------------------------------
  5 regular-guest        running
[root@phy-host-amd ~]# 
[root@phy-host-amd ~]# virt-cat regular-guest /var/log/messages | grep 'dhclient.*bound to'
Jan 17 10:13:06 dhcpyy-zz dhclient[732]: bound to ww.xx.yy.zz -- renewal in 32578 seconds.

(Note: ‘ww.xx.yy.zz’ above will be a bridged IP address)

Create the nested guest
Now. install virt-packages in the regular-guest. Also, let’s check if the /dev/kvm char device is exposed in the regular-guest ; and start the libvirtd service.


[root@regular-guest ~]# file /dev/kvm 
/dev/kvm: character special
[root@regular-guest ~]# systemctl status libvirtd.service 
libvirtd.service - LSB: daemon for libvirt virtualization API
          Loaded: loaded (/etc/rc.d/init.d/libvirtd)
          Active: active (running) since Tue, 17 Jan 2012 10:49:25 -0500; 5s ago
         Process: 1440 ExecStart=/etc/rc.d/init.d/libvirtd start (code=exited, status=0/SUCCESS)
        Main PID: 1448 (libvirtd)
          CGroup: name=systemd:/system/libvirtd.service
                  ├ 1448 libvirtd --daemon
                  └ 1501 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --exce...

Proceed with installing a minimal F16 nested-guest w/ virt-install. The script I used is here

Debugging note: Once the guest install is finished, fix the serial console access by disabling plymouth-service using this workaround. This will let us login via virsh serial console(to get kernel and boot messages) w/o any line breaks while entering credentials:

 # ln -s /dev/null /etc/systemd/system/plymouth-start.service

Get the (NATed) IP of the nested-guest. (Also, grepped for the qemu-kvm command-line of the nested-guest.)


[root@regular-guest ~]# virsh list
 Id Name                 State
----------------------------------
  2 nested-guest         running
[root@regular-guest ~]# ps -ef | grep qemu-kvm
qemu      2245     1  2 Jan17 ?        00:20:11 /usr/bin/qemu-kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 3,sockets=3,cores=1,threads=1 -name nested-guest -uuid 2aae2ab5-ddb6-2585-aa16-7fe97296f34b -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/nested-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/nested-guest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0e:4e:53,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

[root@regular-guest ~]# virt-cat nested-guest /var/log/messages | grep 'dhclient.*bound to'                                                            
Jan 17 11:08:30 localhost dhclient[721]: bound to 192.168.122.220 -- renewal in 1393 seconds.
[root@regular-guest ~]# 

SSh into the nested-guest, install virt-what package and run to see if we’re on a hypervisor


[root@localhost ~]# cat /etc/fedora-release 
Fedora release 16 (Verne)
[root@localhost ~]# ifconfig eth0 | grep inet
          inet addr:192.168.122.220  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe0e:4e53/64 Scope:Link
[root@localhost ~]# 
[root@localhost ~]# virt-what 
kvm

Wooo!! so we’re on an OS which is inside an OS which is inside an OS.

About these ads

12 Comments

Filed under Uncategorized

12 responses to “Nested Virtualization with KVM and AMD

  1. rich

    I’m interested how you found the experience. When I tried it, the guest (qemu-kvm) just crashed all the time. Was it stable? Was it fast? Did you try running a libguestfs program inside, and was that faster than normal?

    • Rich,

      So far they’re(regular-guest & nested-guest) stable and pretty moderately fast. I have 6 vcpu/4GB RAM for the regular-guest and 3 vcpu/3GB RAM for the nested-guest.

      I invoked guestfish in read-only on the running nested-guest domain. (libguestfs-1.14.8-1.fc16.x86_64)
      - The below took 1 minute

       # guestfish --ro -i nested-guest 

      - I exited from the above fs shell, and then ran the below(from regular-guest).

      
      # time guestfish --ro -i nested-guest << EOF
      run
      cat /etc/redhat-release 
      EOF                     
      
      Fedora release 16 (Verne)
      
      
      real    0m30.372s
      user    0m24.137s
      sys     0m10.046s
      

      Is this what you’re looking for? Let me know if you want me to try any other tests(I/O types?) w/ libguestfs. I’d be happy to try.

      • rich

        A good test would be to compare the command below, run on the host, and inside the first level guest. You have to run the command several times and discard the first few results, to get a hot cache.

        $ time guestfish -a /dev/null run
        

        With good hardware acceleration on baremetal, you should be seeing times like 3-8 seconds.

        30 seconds sounds like it’s not using hardware acceleration, or the (nested) acceleration if there is any is not that effective.

      • Rich,
        The result I previously posted was for the first run of guestfish(and that was on nested-acceleration). So that result is invalid.

        Here is the new result. I ran this 10 times each on bare-metal and on regular-guest( nested-acceleration) to get a hot cache.

        
        [root@phy-host-amd ~]# time guestfish -a /dev/null run
        
        real    0m4.838s
        user    0m1.873s
        sys     0m2.665s
        

        And on the first-level guest:

        
        [root@regular-guest ~]# time guestfish -a /dev/null run
            
        real    0m11.533s
        user    0m7.234s
        sys     0m4.384s
        

        So, ~4 seconds on bare-metal, and ~11 seconds on first-level guest. This looks more reasonable to you?
        PS: I can safely confirm, hardware acceleration is enabled and being used.

      • rich

        Yes, it’s not too bad. You can see there’s some overhead from the nested emulation of hardware virt.

        Another test would be to disable nested virt and measure guestfish running in the guest. That would tell you how much of a speed-up you are getting with nested virt compared to pure software emulation in the guest.

      • Sure, I’ll try that. That requires rebooting of the phyical host to take effect I guess(which is not in my control at the moment)

  2. Mike

    Hi, can you please post the XML-File of your regular guest with the manually added CPU-Attributes? I have some problems by adding them to my XML, and I´m not quite sure where the problem is!

    Thanks,
    Mike

  3. Raymond Jennings

    Is there a limit (besides memory constraints or CPU power) to how deeply you can nest virtual machines?

    • Raymond, I doubt if anyone has tested nesting beyond one level. So, I cannot answer your question with evidence :) . But, my guess is 2nd level of nesting would not be as smooth as the first one.

  4. Bob Pelerson

    This is pretty cool but seems very Linux specific. Is there a way to run FreeBSD as the guest OS (and ideally, as the host OS too)?

    • Note, currently Linux on Linux itself is very fragile, let alone other distros. And I have no idea what’s the status of KVM port to FreeBSD, sorry.

      And, indeed it’s linux specific, KVM is a linux /kernel/ module :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s