Tag Archives: Virtualization

Resize a fedora 19 guest with libguestfs tools

The inimitable Rich Jones writes some incredibly useful software. I can’t count how many times they helped me debugging disk images, or gave great insights into different aspects of linux virtualization. One of those instances again — I had to resize my OpenStack Fedora guest as its root file system merely had 5.4 GB to start with. So, I wanted to add atleast 15 GB more. After a bit of trial & error, here’s how I got it working. I’m using Fedora-19 in this case, but any other distro which supports libguestfs should be just fine.

Firstly, let’s check disk space inside the guest:

$ df -hT
    Filesystem              Type      Size  Used Avail Use% Mounted on
    /dev/mapper/fedora-root ext4      5.4G  5.4G     0 100% /
    devtmpfs                devtmpfs  4.7G     0  4.7G   0% /dev
    tmpfs                   tmpfs     4.7G     0  4.7G   0% /dev/shm
    tmpfs                   tmpfs     4.7G  392K  4.7G   1% /run
    tmpfs                   tmpfs     4.7G     0  4.7G   0% /sys/fs/cgroup
    tmpfs                   tmpfs     4.7G  472K  4.7G   1% /tmp
    /dev/vda1               ext4      477M   87M  365M  20% /boot
    /dev/loop0              ext4      4.6G   10M  4.4G   1% /srv/node/device1
    /dev/loop1              ext4      4.6G   10M  4.4G   1% /srv/node/device2
    /dev/loop2              ext4      4.6G   10M  4.4G   1% /srv/node/device3
    /dev/loop3              ext4      4.6G   10M  4.4G   1% /srv/node/device4
    /dev/vdb                ext4       17G   44M   16G   1% /mnt/newdisk

Print the libvirt XML to get the source of the disk

$ virsh dumpxml f19-test | grep -i source
      <source file='/var/lib/libvirt/images/f19-test.qcow2'/>

Above, I'm using a qcow2 disk image, I converted it to raw:

$ qemu-img convert -f qcow2 -O raw \
  /var/lib/libvirt/images/f19-test.qcow2 \
  /var/lib/libvirt/images/f19-test.raw

List the filesystems, partitions, block devices inside the raw disk image:

$ virt-filesystems --long --all -h -a \ 
  /var/lib/libvirt/images/f19-test.raw
Name              Type        VFS   Label  MBR  Size  Parent
/dev/sda1         filesystem  ext4  -      -    500M  -
/dev/fedora/root  filesystem  ext4  -      -    5.6G  -
/dev/fedora/swap  filesystem  swap  -      -    3.9G  -
/dev/fedora/root  lv          -     -      -    5.6G  /dev/fedora
/dev/fedora/swap  lv          -     -      -    3.9G  /dev/fedora
/dev/fedora       vg          -     -      -    9.5G  /dev/sda2
/dev/sda2         pv          -     -      -    9.5G  -
/dev/sda1         partition   -     -      83   500M  /dev/sda
/dev/sda2         partition   -     -      8e   9.5G  /dev/sda
/dev/sda          device      -     -      -    10G   -

Now, extend the file size of the raw disk image, using truncate:

# Create a new file based on original
$ truncate -r f19-test.raw f19-test.raw.new
# Adjust the new file size to 15G
$ truncate -s +15G f19-test.raw.new

List the file system partition info to find out the block device name:

$ virt-filesystems --partitions \
  --long -h -a f19-test.raw
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   500M  /dev/sda
/dev/sda2  partition  8e   9.5G  /dev/sda

Now, resize the new disk image using virt-resize. Note that, the --lv-expand option expands the root file system (thx Rich!):

$ virt-resize --expand /dev/sda2 --lv-expand \
  /dev/fedora/root f19-test.raw f19-test.raw.new
    Examining f19-test.raw ...
    **********
    
    Summary of changes:
    
    /dev/sda1: This partition will be left alone.
    
    /dev/sda2: This partition will be resized from 9.5G to 24.5G.  The LVM 
        PV on /dev/sda2 will be expanded using the 'pvresize' method.
    
    /dev/fedora/root: This logical volume will be expanded to maximum size. 
         The filesystem ext4 on /dev/fedora/root will be expanded using the 
        'resize2fs' method.
    
    **********
    Setting up initial partition table on f19-test.raw.new ...
    Copying /dev/sda1 ...
     100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
    Copying /dev/sda2 ...
     100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
    Expanding /dev/sda2 using the 'pvresize' method ...
    Expanding /dev/fedora/root using the 'resize2fs' method ...
    
    Resize operation completed with no errors.  Before deleting the old 
    disk, carefully check that the resized disk boots and works correctly.

Now, the size of the both new guests

$ ls -lash f19-test.raw f19-test.raw.new
2.7G -rw-r--r--. 1 qemu qemu 10G Apr 10 11:12 f19-test.raw
11G -rw-r--r--. 1 root root 25G Apr 10 12:13 f19-test.raw.new

Replace the old one w/ new one (you might want to take backup of the old one here, just in case):

$ mv f19-test.raw.new f19-test.raw

Also, update the libvirt XML file of the guest to reflect the raw disk image:

# Update source file path
$ virsh edit f19-test 
# grep the xml file to ensure.
$ grep source /etc/libvirt/qemu/f19-test.xml 
	
	

List file systems inside the newly created guest:

$ virt-filesystems --partitions --long -h -a f19-test.raw
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   500M  /dev/sda
/dev/sda2  partition  8e   25G   /dev/sda

Start the guest & ensure if everything looks sane:

$ virsh start f19-test --console

Leave a comment

Filed under Uncategorized

Multiple ways to access QEMU Monitor Protocol (QMP)

Once QEMU is built, to get a finer understanding of it, or even for plain old debugging, having familiarity with QMP (QEMU Monitor Protocol) is quite useful. QMP allows applications — like libvirt — to communicate with a running QEMU’s instance. There are a few different ways to access the QEMU monitor to query the guest, get device (eg: PCI, block, etc) information, modify the guest state (useful to understand the block layer operations) using QMP commands. This post discusses a few aspects of it.

Access QMP via libvirt’s qemu-monitor-command
Libvirt had this capability for a long time, and this is the simplest. It can be invoked by virsh -- on a running guest, in this case, called 'devstack':

$ virsh qemu-monitor-command devstack \
--pretty '{"execute":"query-kvm"}'
{
    "return": {
        "enabled": true,
        "present": true
    },
    "id": "libvirt-8"
}

In the above example, I ran the simple command query-kvm which checks if (1) the host is capable of running KVM (2) and if KVM is enabled. Refer below for a list of possible 'qeury' commands.

QMP via telnet
To access monitor via any other way, we need to have qemu instance running in control mode, via telnet:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp tcp:localhost:4444,server --monitor stdio
QEMU waiting for connection on: tcp::127.0.0.14444,server
VNC server running on `127.0.0.1:5900'
QEMU 1.4.50 monitor - type 'help' for more information
(qemu)

And, from a different shell, connect to that listening port 4444 via telnet:

$ telnet localhost 4444

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}

We have to first enable QMP capabilities. This needs to be run before invoking any other commands, do:

{ "execute": "qmp_capabilities" }

QMP via unix socket
First, invoke the qemu binary in control mode using qmp, and create a unix socket as below:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp unix:./qmp-sock,server --monitor stdio
QEMU waiting for connection on: unix:./qmp-sock,server

A few different ways to connect to the above qemu instance running in control mode, vi QMP:

  1. Firstly, via nc :

    $ nc -U ./qmp-sock
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    
  2. But, with the above, you have to manually enable the QMP capabilities, and type each command in JSON syntax. It's a bit cumbersome, & no history of commands typed is saved.

  3. Next, a more simpler way -- a python script called qmp-shell is located in the QEMU source tree, under qemu/scripts/qmp/qmp-shell, which hides some details -- like manually running the qmp_capabilities.

    Connect to the unix socket using the qmp-shell script:

    $ ./qmp-shell ../qmp-sock 
    Welcome to the QMP low-level shell!
    Connected to QEMU 1.4.50
    
    (QEMU) 
    

    Then, just hit the key, and all the possible commands would be listed. To see a list of query commands:

    (QEMU) query-<TAB>
    query-balloon               query-commands              query-kvm                   query-migrate-capabilities  query-uuid
    query-block                 query-cpu-definitions       query-machines              query-name                  query-version
    query-block-jobs            query-cpus                  query-mice                  query-pci                   query-vnc
    query-blockstats            query-events                query-migrate               query-status                
    query-chardev               query-fdsets                query-migrate-cache-size    query-target                
    (QEMU) 
    
  4. Finally, we can also acess the unix socket using socat and rlwrap. Thanks to upstream qemu developer Markus Armbruster for this hint.

    Invoke it this way, also execute a couple of commands -- qmp_capabilities, and query-kvm, to view the response from the server.

    $ rlwrap -H ~/.qmp_history \
      socat UNIX-CONNECT:./qmp-sock STDIO
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    {"execute":"qmp_capabilities"}
    {"return": {}}
    { "execute": "query-kvm" }
    {"return": {"enabled": true, "present": true}}
    

    Where, qmp_history contains recently ran QMP commands in JSON syntax. And rlwrap adds decent editing capabilities, recursive search & history. So, once you run all your commands, the ~/.qmp_history has a neat stack of all the QMP commands in JSON syntax.

    For instance, this is what my ~/.qmp_history file contains as I write this:

    $ cat ~/.qmp_history
    { "execute": "qmp_capabilities" }
    { "execute": "query-version" }
    { "execute": "query-events" }
    { "execute": "query-chardev" }
    { "execute": "query-block" }
    { "execute": "query-blockstats" }
    { "execute": "query-cpus" }
    { "execute": "query-pci" }
    { "execute": "query-kvm" }
    { "execute": "query-mice" }
    { "execute": "query-vnc" }
    { "execute": "query-spice " }
    { "execute": "query-uuid" }
    { "execute": "query-migrate" }
    { "execute": "query-migrate-capabilities" }
    { "execute": "query-balloon" }
    

To illustrate, I ran a few query commands (noted above) which provides an informative response from the server -- no change is done to the state of the guest -- so these can be executed safely.

I personally prefer the libvirt way, & accessing via unix socket with socat & rlwrap.

NOTE: To try each of the above variants, fisrst quit -- type quit on the (qemu) shell -- the qemu instance running in control mode, reinvoke it, then access it via one of the 3 different ways.

1 Comment

Filed under Uncategorized

Nested virtualization with KVM and Intel on Fedora-18

KVM nested virtualization with Intel finally works for me on Fedora-18. All three layers L0 (physical host) -> L1(regular-guest/guest-hypervisor) -> L2 (nested-guest) are running successfully as of writing this.

Previously, nested KVM virtualization on Intel was discussed here and here. This time on Fedora-18, I was able to successfully boot and use nested guest with resonable performance. (Although, I still have to do more formal tests to show some meaningful performance results).

Test setup information

Config info about the physical host, regular-guest/guest hypervisor and nested-guest. (All of them are Fedora-18; x86_64)

  • Physical Host (Host hypervisor/Bare metal)
    • Node info and some version info
      
      #--------------------#
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1995 MHz
      CPU socket(s):       1
      Core(s) per socket:  4
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         10242692 KiB
      
      #--------------------#
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
      3.6.7-5.fc18.x86_64
      x86_64
      qemu-kvm-1.3.0-9.fc18.x86_64
      libvirt-daemon-kvm-1.0.2-1.fc18.x86_64
      #
      #--------------------# 
      
  • Regualr Guest (Guest Hypervisor)
    • A 20GB qcow2 disk image w/ cache='none' enabled in the libvirt xml
    • 
      #--------------------# 
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1994 MHz
      CPU socket(s):       4
      Core(s) per socket:  1
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         4049888 KiB
      #--------------------# 
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
      3.6.10-4.fc18.x86_64
      x86_64
      qemu-kvm-1.2.2-6.fc18.x86_64
      libvirt-daemon-kvm-0.10.2.3-1.fc18.x86_64
      #--------------------# 
      
  • Nested Guest
    • Config: 2GB Memory; 2 vcpus; 6GB sparse qcow2 disk image

Setting up guest hypervisor and nested guest

Refer the notes linked above to get the nested guest up and running:

  • Create a regular guest/guest-hypervisor --
     # ./create-regular-f18-guest.bash 
  • Expose intel VMX extensions inside the guest-hypervisor by adding the cpu’ attribute to the regular-guest’s libvirt xml file
  • Shutdown regular guest, Redefine it ( virsh define /etc/libvirt/qemu/regular-guest-f18.xml ) ; Start the guest ( virsh start regular-guest-f18 )
  • Now, install virtualization packages inside the guest-hypervisor
  • -

     # yum install libvirt-daemon-kvm libvirt-daemon-config-network libvirt-daemon-config-nwfilter python-virtinst -y 
  • Start libvirtd service --
     # systemctl start libvirtd.service && systemctl status libvirtd.service  
  • Create a nested guest
     # ./create-nested-f18-guest.bash 

The scripts, and reference libvirt xmls I used for this demonstration are posted on github .

qemu-kvm invocation of bare-metal and guest hypervisors

qemu-kvm invocation of regular guest (guest hypervisor) indicating vmx extensions


# ps -ef | grep -i qemu-kvm | egrep -i 'regular-guest-f18|vmx'
qemu     15768     1 19 13:33 ?        01:01:52 /usr/bin/qemu-kvm -name regular-guest-f18 -S -M pc-1.3 -cpu core2duo,+vmx -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 9a7fd95b-7b4c-743b-90de-fa186bb5c85f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest-f18.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/export/vmimgs/regular-guest-f18.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:ff:96,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Running virt-host-validate (it's part of libvirt-client package) on bare-metal host indicting the host is configured to run KVM


# virt-host-validate 
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking for device /dev/kvm                                         : PASS
  QEMU: Checking for device /dev/vhost-net                                   : PASS
  QEMU: Checking for device /dev/net/tun                                     : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS
# 

Networking Info
- The regular guest is using the bare metal host's bridge device 'br0'
- The nested guest is using libvirt's default bridge 'virbr0'

Caveat : If NAT'd networking is used on both bare metal & guest hypervisor, both, by default have 192.168.122.0/24 network subnet (unless explicitly changed), and will mangle the networking setup. Bridging on L0 (bare metal host), and NAT on L1 (guest hypervisor) avoids this.

Notes

  • Ensure to have serial console enabled in the both L1 and L2 guests, very handy for debugging. If you use the kickstart file mentioned here, it's taken care of. The magic lines to be added to kernel cmd line are console=tty0 console=ttyS0,115200
  • Once the nested guest was created, I tried to set the hostname and it turns out for some reason ext4 has made the file system read-only :
    
    	#  hostnamectl set-hostname nested-guest-f18.foo.bar.com
    Failed to issue method call: Read-only file system
    

    The I see these I/O errors from /var/log/messages:

    
    .
    .
    .
    Feb 12 04:22:31 localhost kernel: [  724.080207] end_request: I/O error, dev vda, sector 9553368
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33467
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33468
    

    At this point, I tried to reboot the guest, only to be thrown at a dracut repair shell. I tried fsck a couple of times, & then tried to reboot the nested guest, to no avail. Then I force powered-off the nested-guest:

    #virsh destroy nested-guest-f18

    Now, it boots just fine -- just while I was trying to get to the bottom of the I/O errors. I was discussing this behaviour with Rich Jones, and he suggested to try some more I/O activity inside the nested guest to see if I can trigger those errors again.

    
    # find / -exec md5sum {} \; > /dev/null
    # find / -xdev -exec md5sum {} \; > /dev/null
    

    After the above commands ran for more than 15 minutes, the I/O errors can't be triggered any more,

  • A test for libugestfs program (from rwmj) would be on the host & first level guest to compare. The command needs to be ran several times and discard the first few results, to get a hot cache.
    
    # time guestfish -a /dev/null run' 
    
  • Another libguestfs test Rich suggested is to disable nested virt and measure guestfish running in the guest to find the speed-up from nested virtualization in contrast to pure software emulation.

Next, to run more useful work loads in these nested vmx guests.

1 Comment

Filed under Uncategorized

Creating rapid thin-provisioned guests using QEMU backing files

Provisioning virtual machines very rapidly is highly desirable, especially, when deploying large number of virtual machines. With QEMU’s backing files concept, we can instantiate several clones, by creating a single base-image and then sharing it(read-only) across multiple guests. So that, these guests, when modified will write all their changes to their disk image

To exemplify:

Initially, let’s create a minimal Fedora 17 virtual guest (I used this script), and copy the resulting qcow2 disk image as base-f17.qcow2. So, base-f17.qcow2 has Fedora 17 on it, and is established as our base image. Let’s see the info of it

$ qemu-img info base-f17.qcow2
image: base-f17.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 5.0G
cluster_size: 65536
[root@localhost vmimages]# 

Now, let's make use of the above F17 base image and try to instantiate 2 more Fedora 17 virtual machines, quickly. First, create a new qcow2 file(f17vm2-with-b.qcow2) using the base-f7.qcow2 as its backing-file:

$ qemu-img create -b /home/kashyap/vmimages/base-f17.qcow2 \
  -f qcow2 /home/kashyap/vmimages/f17vm2-with-b.qcow2
Formatting '/home/kashyap/vmimages/f17vm2-with-b.qcow2', fmt=qcow2 size=5368709120 backing_file='/home/kashyap/vmimages/base-f17.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 

And now, let's see some information about the just created disk image. (It can be noticed the 'backing file' attribute below pointing to our base image(base-f17.qcow2)

$ qemu-img info /home/kashyap/vmimages/f17vm2-with-b.qcow2
image: /home/kashyap/vmimages/f17vm2-with-b.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/kashyap/vmimages/base-f17.qcow2
[root@localhost vmimages]# 

Now, we're set -- our 'f17vm2-with-b.qcow2' is ready to use. We can verify it in two ways:

  1. to quickly verify, we can invoke qemu-kvm (not recommended in production) -- this will boot our new guest on stdio, and throws a serial console (NOTE: the base-f17.qcow2 had 'console=tty0 console=ttyS0,115200' on its kernel command line, so that it can provide serial console) --
    $ qemu-kvm -enable-kvm -m 1024 f17vm2-with-b.qcow2 -nographic
    
                              GNU GRUB  version 2.00~beta4
    
     +--------------------------------------------------------------------------+
     |Fedora Linux                                                              | 
     |Advanced options for Fedora Linux                                         |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          | 
     +--------------------------------------------------------------------------+
    
          Use the ^ and v keys to select which entry is highlighted.      
          Press enter to boot the selected OS, `e' to edit the commands      
          before booting or `c' for a command-line.      
                                                                                   
                                                                                   
    Loading Linux 3.3.4-5.fc17.x86_64 ...
    Loading initial ramdisk ...
    [    0.000000] Initializing cgroup subsys cpuset
    .
    .
    .
    (none) login: root
    Password: 
    Last login: Thu Oct  4 07:07:54 on ttyS0
    $ 
    
  2. The other, more traditional way(so that libvirt could track it & can be used to manage the guest), is to copy a similar(F17) libvirt XML file, edit and update the name, uuid, disk path, mac-address, then define it, and start it via 'virsh':
    $ virsh define f17vm2-with-b.xml
    $ virsh start f17vm2-with-b --console
    $  virsh list
     Id    Name                           State
    ----------------------------------------------------
     9     f17v2-with-b                  running
    

Now, let's quickly check the disk-image size of our new thin-provisioned guest. It can be noticed, the size is quite thin (14Mb) -- meaning, only the delta from the original backing file will be written to this image.

$ ls -lash f17vm2-with-b.qcow2
14M -rw-r--r--. 1 root root 14M Oct  4 06:30 f17vm2-with-b.qcow2
$

To instantiate our 2nd F17 guest(say f17vm3-with-b) -- again, create a new qcow2 file(f17vm3-with-b.qcow2) with its backing file as our base image base-f17.qcow2 . And then, check the info of the disk image using 'qemu-img' tool.

#----------------------------------------------------------#
$ qemu-img create -b /home/kashyap/vmimages/base-f17.qcow2
 &nbsp -f qcow2 /home/kashyap/vmimages/f17vm3-with-b.qcow2
Formatting '/home/kashyap/vmimages/f17vm3-with-b.qcow2', fmt=qcow2 size=5368709120 backing_file='/home/kashyap/vmimages/base-f17.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
#----------------------------------------------------------#
$ qemu-img info /home/kashyap/vmimages/f17vm3-with-b.qcow2
image: /home/kashyap/vmimages/f17vm3-with-b.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/kashyap/vmimages/base-f17.qcow2
$
#----------------------------------------------------------#

[it's worth noting here that we're pointing to the same base image, and multiple guests are using it as a backing file.]

Again check the disk image size of the thin-provisioned guest:

$ ls -lash f17vm3-with-b.qcow2
14M -rw-r--r--. 1 qemu qemu 14M Oct  4 07:18 f17vm3-with-b.qcow2

Goes without saying, the 2nd F17 guest also has a new XML file, defined w/ its unique attributes just like the 1st F17 guest.

$ virsh list
 Id    Name                           State
----------------------------------------------------
 9     f17vm2-with-b                  running
 10    f17vm3-with-b                  running

For reference sake, I've posted the xml file I've used for 'f17vm3-with-b' guest here

To summarize, by sharing a single, common base-image, we can quickly deploy multiple thin-provisioned virtual machines.


                      .----------------------.
                      | base-image-f17.qcow2 |
                      |                      |
                      '----------------------'
                         /       |         \
                        /        |          \
                       /         |           \
                      /          |            \
         .-----------v--.  .-----v--------.  .-v------------.
         | f17vm2.qcow2 |  | f17vm3.qcow2 |  | f17vmN.qcow2 |
         |              |  |              |  |              |
         '--------------'  '--------------'  '--------------'
            

2 Comments

Filed under Uncategorized

External(and Live) snapshots with libvirt

Previously, I posted about snapshots here , which briefly discussed different types of snapshots. In this post, let’s explore how external snapshots work. Just to quickly rehash, external snapshots are a type of snapshots where, there’s a base image(which is the original disk image), and then its difference/delta (aka, the snapshot image) is stored in a new QCOW2 file. Once the snapshot is taken, the original disk image will be in a ‘read-only’ state, which can be used as backing file for other guests.

It’s worth mentioning here that:

  • The original disk image can be either in RAW format or QCOW2 format. When a snapshot is taken, ‘the difference’ will be stored in a different QCOW2 file
  • The virtual machine has to be running, live. Also with Live snapshots, no guest downtime is experienced when a snapshot is taken.
  • At this moment, external(Live) snapshots work for ‘disk-only’ snapshots(and not VM state). Work for both disk and VM state(and also, reverting to external disk snapshot state) is in-progress upstream(slated for libvirt-0.10.2).

Before we go ahead, here’s some version info, I’m testing on Fedora-17(host), and the guest(named ‘testvm’) is running Fedora-18(Test Compose):

$ rpm -q libvirt qemu-kvm ; uname -r
libvirt-0.10.1-3.fc17.x86_64
qemu-kvm-1.2-0.2.20120806git3e430569.fc17.x86_64
3.5.2-3.fc17.x86_64
$ 

External disk-snapshots(live) using QCOW2 as original image:
Let's see an illustration of external(live) disk-only snapshots. First, let's ensure the guest is running:

$ virsh list
 Id    Name                           State
----------------------------------------------------
 3     testvm                          running


$ 

Then, list all the block devices associated with the guest:

$ virsh domblklist testvm --details
Type       Device     Target     Source
------------------------------------------------
file       disk       vda        /export/vmimgs/testvm.qcow2

$ 

Next, let's create a snapshot(disk-only) of the guest this way, while the guest is running:

$ virsh snapshot-create-as testvm snap1-testvm "snap1 description" \
  --diskspec vda,file=/export/vmimgs/snap1-testvm.qcow2 \
  --disk-only --atomic

Some details of the flags used:
- Passing a '--diskspec' parameter adds the 'disk' elements to the Snapshot XML file
- '--disk-only' parameter, takes the snapshot of only the disk
- '--atomic' just ensures either the snapshot is run completely or fails w/o making any changes

Let's check the information about the just taken snapshot by running qemu-img:

$ qemu-img info /export/vmimgs/snap1-testvm.qcow2 
image: /export/vmimgs/snap1-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/testvm.qcow2
$ 

Apart from the above, I created 2 more snapshots(just the same syntax as above) for illustration purpose. Now, the snapshot-tree looks like this:

$ virsh snapshot-list testvm --tree

snap1-testvm
  |
  +- snap2-testvm
      |
      +- snap3-testvm
        

$ 

For the above example image file chain[ base<-snap1<-snap2<-snap3 ], it has to be read as - snap3 has snap2 as its backing file, snap2 has snap1 as its backing file, and snap1 has the base image as its backing file. We can see the backing file info from qemu-img:

#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap3-testvm.qcow2
image: /export/vmimgs/snap3-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 129M
cluster_size: 65536
backing file: /export/vmimgs/snap2-testvm.qcow2
#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap2-testvm.qcow2
image: /export/vmimgs/snap2-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.6M
cluster_size: 65536
backing file: /export/vmimgs/snap1-testvm.qcow2
#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap1-testvm.qcow2
image: /export/vmimgs/snap1-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/testvm.qcow2
$
#--------------------------------------------#

Now, if we do not need snap2 any more, and want to pull all the data from snap1 into snap3, making snap1 as snap3's backing file, we can do a virsh blockpull operation as below:

#--------------------------------------------#
$ virsh blockpull --domain testvm \
  --path /export/vmimgs/snap3-testvm.qcow2 \
  --base /export/vmimgs/snap1-testvm.qcow2 \
  --wait --verbose
Block Pull: [100 %]
Pull complete
#--------------------------------------------#

Where, --path = path to the snapshot file, and --base = path to a backing file from which the data to be pulled. So from above example, it's evident that we're pulling the data from snap1 into snap3, and thus flattening the backing file chain resulting in snap1 as snap3's backing file, which can be noticed by running qemu-img again.
Thing to note here,

$ qemu-img info /export/vmimgs/snap3-testvm.qcow2
image: /export/vmimgs/snap3-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 145M
cluster_size: 65536
backing file: /export/vmimgs/snap1-testvm.qcow2
$ 

A couple of things to note here, after discussion with Eric Blake(thank you):

  • If we do a listing of the snapshot tree again(now that 'snap2-testvm.qcow2' backing file is no more in use),
$ virsh snapshot-list testvm --tree
snap1-testvm
  |
  +- snap2-testvm
      |
      +- snap3-testvm
$

one might wonder, why is snap3 still pointing to snap2? Thing to note here is, the above is the snapshot chain, which is independent from each virtual disk's backing file chain. So, the 'virsh snapshot-list' is still listing the information accurately at the time of snapshot creation(and not what we've done after creating the snapshot). So, from the above snapshot tree, if we were to revert to snap1 or snap2 (when revert-to-disk-snapshots is available), it'd still be possible to do that, meaning:

It's possible to go from this state:
base <- snap123 (data from snap1, snap2 pulled into snap3)

we can still revert to:

base<-snap1 (thus undoing the changes in snap2 & snap3)

External disk-snapshots(live) using RAW as original image:
With external disk-snapshots, the backing file can be RAW as well (unlike with 'internal snapshots' which only work with QCOW2 files, where the snapshots and delta are all stored in a single QCOW2 file)

A quick illustration below. The commands are self-explanatory. It can be noted the change(from RAW to QCOW2) in the block disk associated with the guest, before & after taking the disk-snapshot (when virsh domblklist command was executed)

#-------------------------------------------------#
$ virsh list | grep f17btrfs2
 7     f17btrfs2                      running
$
#-------------------------------------------------#
$ qemu-img info /export/vmimgs/f17btrfs2.img
image: /export/vmimgs/f17btrfs2.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 1.5G
$ 
#-------------------------------------------------#
$ virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/f17btrfs2.img

$ 
#-------------------------------------------------#
$ virsh snapshot-create-as f17btrfs2 snap1-f17btrfs2 \
  "snap1-f17btrfs2-description" \
  --diskspec hda,file=/export/vmimgs/snap1-f17btrfs2.qcow2 \
  --disk-only --atomic
Domain snapshot snap1-f17btrfs2 created
$ 
#-------------------------------------------------#
$ qemu-img info /export/vmimgs/snap1-f17btrfs2.qcow2
image: /export/vmimgs/snap1-f17btrfs2.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: /export/vmimgs/f17btrfs2.img
$ 
#-------------------------------------------------#
$ virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/snap1-f17btrfs2.qcow2
$ 
#-------------------------------------------------#

Also note: All snapshot XML files, where libvirt tracks the metadata of snapshots are are located under /var/lib/libvirt/qemu/snapshots/$guestname (and the original libvirt xml file is located under /etc/libvirt/qemu/$guestname.xml)

12 Comments

Filed under Uncategorized

Nested Virtualization with Intel — take-2 with Fedora-17

My previous attempt with Fedora 16 to create a nested virtual guest on an Intel CPU was only 90% success,. I just gave a retry with Fedora 17, and the newest available virt packages from virt-preview repository.

I posted some notes, configurations of physical host, regular guest and nested guest, and the scripts I used on my fedora people page

Few observations:

  • Regular guest(L1) created just fine on the physical host(L0). No news here, this is expected.
  • Shutting down the regular guest causes ‘virsh’ to hang with a segfault. To avoid this, I have to restart libvirtd daemon, and then start the guest. I posted some more details to fedora virt list here and here
  • Now, when I try to create the ‘nested guest’(L2), I don’t see any progress on the serial console once it attempts to retrieve initrd and vmlinuz :

# ./create-nested-guest.bash 
Creating qcow2 disk image..
Formatting '/export/vmimgs/nested-guest-f17.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 preallocation='metadata' 
11G -rw-r--r--. 1 root root 11G Jul 28 10:47 /export/vmimgs/nested-guest-f17.qcow2

Starting install...
Retrieving file .treeinfo...                                                                                                 | 1.8 kB     00:00 !!! 
Retrieving file vmlinuz...                                                                                                   | 8.9 MB     00:00 !!! 
Retrieving file initrd.img...                                                                                                |  47 MB     00:00 !!! 
Creating domain...                                                                                                           |    0 B     00:00     

I tried to view,using less or tail, system/libvirt logs, check status of libvirtd daemon, or try virsh list in the 'regular guest' to no avail. Those commands are just hung on the stdout.

A little bit more detail in a text file here.

Meanwhile, here's the version detail. I used the same kernel, qemu-kvm, libvirt on both Physical host and Regular guest:


[root@moon ~]# uname -r ; rpm -q qemu-kvm libvirt 
3.4.6-2.fc17.x86_64
qemu-kvm-1.1.0-9.fc17.x86_64
libvirt-0.9.13-3.fc17.x86_64
[root@moon ~]# 

I'm still investigating, will update here, once I have more information.

4 Comments

Filed under Uncategorized

Nested Virtualization with KVM and AMD

After my previous attempt the other day to create a nested-guest(kvm on kvm) with Intel arch, I got hold of an AMD server machine with virt-extensions enabled and gave it a whirl. This went slightly smoother than the Intel attempt.

Some config info about the physical host, regular-guest and nested-guest. (All of them are Fedora-16; x86_64)

  • Physical Host (Host hypervisor/Bare metal)
    • 
      [root@phy-host-amd]# virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              16
      CPU frequency:       2000 MHz
      CPU socket(s):       2
      Core(s) per socket:  8
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         8173352 kB
      
  • Regualr Guest (Or Guest Hypervisor)
    • Config: 4GB Memory; 6 vcpus; 22GB Raw disk image w/ cache='none' enabled in the libvirt xml
  • Nested Guest
    • Config: 2GB Memory; 3 vcpus; 10G Raw disk image

Ensure nesting is enabled on the physical host

Let's ensure kvm_amd kernel module is enabled with 'nested' virt.


[root@phy-host-amd ~]# modinfo kvm_amd | grep -i nested
parm:           nested:int
[root@phy-host-amd ~]# 

[root@phy-host-amd ~]# cat /sys/module/kvm_amd/parameters/nested
1
[root@phy-host-amd ~]# 

[root@phy-host-amd ~]# systool -m kvm_amd -v   | grep -i nested
    nested              = "1"
[root@phy-host-amd ~]# 

CAVEAT: To make life a little easier, I configured bridged networking on the physical host to ensure our regular-guest gets a bridged IP; and later, nested-guest gets a NATed IP. I'm noting it here because, the physical host initially had no bridging. The default libvirt bridge virbr0 has 192.168.122.0/24 IP space. So once we set up the regular-guest(or guest-hypervisor), we'll end up having the same IP space. I tried to fix this prob. by creating another 'persistent' libvirt network interface and enabled autostart of it. [virsh net-add; virsh net-define; virsh net-autostart ]. But, it wasn't elegant and messed up networks on reboot.

Set up the guest hypervisor
Create a minimal regular-guest using virt-install . The one I used is posted here

Now, add the cpu attribute to the regular-guest's libvirt xml to expose AMD's svm instructions, which comes with Opteron_G3 model .

Edit the xml using virsh:

# virsh edit regualr-guest 

(which will also define the xml)

Here is the attribute to be added to the guest hypervisor's libvirt xml:

   <cpu>
      <arch>x86_64</arch>
      <model>Opteron_G3</model>
      <vendor>AMD</vendor>
      <topology sockets='2' cores='8' threads='1'/>
      <feature name='wdt'/>
      <feature name='skinit'/>
      <feature name='osvw'/>
      <feature name='3dnowprefetch'/>
      <feature name='cr8legacy'/>
      <feature name='extapic'/>
      <feature name='cmp_legacy'/>
      <feature name='3dnow'/>
      <feature name='3dnowext'/>
      <feature name='pdpe1gb'/>
      <feature name='fxsr_opt'/>
      <feature name='mmxext'/>
      <feature name='ht'/>
      <feature name='vme'/>
    </cpu>

And, restarted the regular-guest, so that it boots w/ the -cpuflag which the AMD virt extensions:


[root@phy-host-amd ~]# ps -ef | grep -i qemu-kvm
qemu     26677     1 14 10:39 ?        00:00:30 /usr/bin/qemu-kvm -S -M pc-0.14 -cpu phenom,+wdt,+skinit,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+cmp_legacy,+lahf_lm,+rdtscp,+pdpe1gb,+popcnt,+cx16,+ht,+vme -enable-kvm -m 4096 -smp 6,sockets=2,cores=8,threads=1 -name regular-guest -uuid 8f6a4478-496b-51d8-2de2-ff7fdb964af3 -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/regular-guest.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:5f:c6:5f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Now, let's fetch the IP of the regular-guest using virt-cat


[root@phy-host-amd ~]# virsh list
 Id Name                 State
----------------------------------
  5 regular-guest        running
[root@phy-host-amd ~]# 
[root@phy-host-amd ~]# virt-cat regular-guest /var/log/messages | grep 'dhclient.*bound to'
Jan 17 10:13:06 dhcpyy-zz dhclient[732]: bound to ww.xx.yy.zz -- renewal in 32578 seconds.

(Note: 'ww.xx.yy.zz' above will be a bridged IP address)

Create the nested guest
Now. install virt-packages in the regular-guest. Also, let's check if the /dev/kvm char device is exposed in the regular-guest ; and start the libvirtd service.


[root@regular-guest ~]# file /dev/kvm 
/dev/kvm: character special
[root@regular-guest ~]# systemctl status libvirtd.service 
libvirtd.service - LSB: daemon for libvirt virtualization API
          Loaded: loaded (/etc/rc.d/init.d/libvirtd)
          Active: active (running) since Tue, 17 Jan 2012 10:49:25 -0500; 5s ago
         Process: 1440 ExecStart=/etc/rc.d/init.d/libvirtd start (code=exited, status=0/SUCCESS)
        Main PID: 1448 (libvirtd)
          CGroup: name=systemd:/system/libvirtd.service
                  ├ 1448 libvirtd --daemon
                  └ 1501 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --exce...

Proceed with installing a minimal F16 nested-guest w/ virt-install. The script I used is here

Debugging note: Once the guest install is finished, fix the serial console access by disabling plymouth-service using this workaround. This will let us login via virsh serial console(to get kernel and boot messages) w/o any line breaks while entering credentials:

 # ln -s /dev/null /etc/systemd/system/plymouth-start.service

Get the (NATed) IP of the nested-guest. (Also, grepped for the qemu-kvm command-line of the nested-guest.)


[root@regular-guest ~]# virsh list
 Id Name                 State
----------------------------------
  2 nested-guest         running
[root@regular-guest ~]# ps -ef | grep qemu-kvm
qemu      2245     1  2 Jan17 ?        00:20:11 /usr/bin/qemu-kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 3,sockets=3,cores=1,threads=1 -name nested-guest -uuid 2aae2ab5-ddb6-2585-aa16-7fe97296f34b -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/nested-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/nested-guest.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0e:4e:53,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

[root@regular-guest ~]# virt-cat nested-guest /var/log/messages | grep 'dhclient.*bound to'                                                            
Jan 17 11:08:30 localhost dhclient[721]: bound to 192.168.122.220 -- renewal in 1393 seconds.
[root@regular-guest ~]# 

SSh into the nested-guest, install virt-what package and run to see if we're on a hypervisor


[root@localhost ~]# cat /etc/fedora-release 
Fedora release 16 (Verne)
[root@localhost ~]# ifconfig eth0 | grep inet
          inet addr:192.168.122.220  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe0e:4e53/64 Scope:Link
[root@localhost ~]# 
[root@localhost ~]# virt-what 
kvm

Wooo!! so we're on an OS which is inside an OS which is inside an OS.

12 Comments

Filed under Uncategorized

Nested Virtualization with KVM Intel

Some context: In regular virtualization, your physical linux host is the hypervisor, and runs multiple operating systems. Nested Virtualization let’s you run a guest inside a regular guest(essentially a Guest hypervisor).For AMD there is nested-support available since a while, and some people reported success w/ nesting KVM guests. For Intel arch., there is support available recently, an year-ish, and some in progress work, so thought I’d give it a whirl when Adam Young started discussion about it in context of openstack project.

Some of the common use-cases for that are being discussed for nested-virtualization
- For instance, a cloud user gets a beefy, Regualar Guest(which she completely controls). Now, this user can turn regular guest into a hypervisor, and can cheerfully run/manage multiple guests for developing or testing w/o the hassle and intervention of the cloud provider.
- Possibility of having a many instances of virtualization setup (hypervisor and its guests) on one single Bare metal.
- Ability to debug and test hypervisor software

I have immediate access to a moderately beefy Intel hardware, and rest of the post is based on Intel’s CPU virt extensions. Before proceeding, let’s settle on some terminology for clarity:

  • Physical Host (Host hypervisor/Bare metal)
    • Config: Intel(R) Xeon(R) CPU(4 cores/socket); 10GB Memory; CPU Freq – 2GHz; Running latest Fedora-16(Minimal foot-print, @core only with Virt pkgs;x86_64; kernel-3.1.8-2.fc16.x86_64
  • Regualr Guest (Or Guest Hypervisor)
    • Config: 4GB Memory; 4vCPU; 20GB Raw disk image with cache =’none’ to have decent I/O; Minimal, @core F16; And same virt-packages as Physical Host; x86_64
  • Nested Guest (Guest installed inside the Regular Guest)
    • Config: 2GB Memory; 1vCPU; Minimal(@core only) F16; x86_64

Enabling Nesting on the Physical Host

Node Info of the Physical Host.

 
# virsh nodeinfo
CPU model:           x86_64
CPU(s):              4
CPU frequency:       1994 MHz
CPU socket(s):       1
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         10242864 kB

Let us first ensure kvm_intel kernel module has nesting enabled. By default, it's disabled for Intel arch[ but enabled for AMD -- SVM (secure virtual machine) extensions arch.]

 
# modinfo kvm_intel | grep -i nested
parm:           nested:bool
# 

And, we need to pass this kvm-intel.nested=1 on kernel commandline while rebooting the host to enable nesting for the Intel KVM kernel module. Which can be verified after boot by doing:

 
# cat /sys/module/kvm_intel/parameters/nested 
Y
# systool -m kvm_intel -v   | grep -i nested
    nested              = "Y"
# 

Or alternatively, Adam Young identified that nesting can be enabled by adding this directive options kvm-intel nested=y to the end of /etc/modprobe.d/dist.conf file and reboot the host so it persists.

Set up the Regular Guest(or Guest hypervisor)
Install a regular guest using virt-install or oz tool or any other preferred way. I made a quick script here. And ensure to have cache='none' in the disk attribute of the Guest Hypervisor's xml file. (observation: Install via virt-install tool didn't seem have this option picked by default.) Here is the 'drive' attribute libvirt xml snippet:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/regular-guest.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

Now, let's try to enable Intel VMX(Virtual Machine Extensions) in the regular guest's CPU. We can do it by running the below on the Physical host(aka Host Hypervisor), and adding the 'cpu' attribute to the regular-guest's libvirt xml file, and start the guest.

# virsh  capabilities | virsh cpu-baseline /dev/stdin 
<cpu match='exact'>
  <model>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='vme'/>
</cpu>

The o/p of the above cmd has a variety of options. Since we need only vmx extensions, I tried the simple way by adding to the regular-guest's libvirt xml(virsh edit ..) and started it.

<cpu match='exact'>
  <model>core2duo</model>
 <feature policy='require' name='vmx'/>
</cpu>

Thanks to Jiri Denemark for the above hint. Also note that, there is a very detailed and informative post from Dan P Berrange on host/guest CPU models in libvirt.

As we enabled vmx in the guest-hypervisor, let's confirm that vmx is exposed in the emulated CPU by ensuring qemu-kvm is invoked with -cpu core2duo,+vmx :


[root@physical-host ~]# ps -ef | grep qemu-kvm
qemu     17102     1  4 22:29 ?        00:00:34 /usr/bin/qemu-kvm -S -M pc-0.14 
-cpu core2duo,+vmx -enable-kvm -m 3072
-smp 3,sockets=3,cores=1,threads=1 -name f16test1 
-uuid f6219dbd-f515-f3c8-a7e8-832b99a24b5d -nographic -nodefconfig 
-nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f16test1.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-drive file=/export/vmimgs/f16test1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=21,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:cc:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Now, let's attempt to create a nested guest

Here comes the more interesting part, the nested-guest config. will be 2G RAM; 1vcpu; 8GB virtual disk. And let's invoke a virt-install cmdline with a minimal kickstart install:


[root@regular-guest ~]# virt-install --connect=qemu:///system \
    --network=bridge:virbr0 \
    --initrd-inject=/root/fed.ks \
   --extra-args=ks=file:/fed.ks console=tty0 console=ttyS0,115200 serial rd_NO_PLYMOUTH \
    --name=nested-guest --disk path=/var/lib/libvirt/images/nested-guest.img,size=6 \
    --ram 2048 \
    --vcpus=1 \
    --check-cpu \
    --hvm \
    --location=http://download.foo.bar.com/pub/fedora/linux/releases/16/Fedora/x86_64/os/
    --nographics

Starting install...
Retrieving file .treeinfo...                                                                                                 | 1.7 kB     00:00 ... 
Retrieving file vmlinuz...                                                                                                   | 7.9 MB     00:08 ... 
Retrieving file initrd.img...                               28% [==============                                   ] 647 kB/s |  38 MB     02:25 ETA 

virt-install proceeds fine(to a certain extent), doing all regular things like getting access to network, create devices, create file-systems, dep checks performed, and finally package install proceeds:


Welcome to Fedora for x86_64



     ┌─────────────────────┤ Package Installation ├──────────────────────┐
     │                                                                   │
     │                                                                   │
     │                                 24%                               │
     │                                                                   │
     │                   Packages completed: 52 of 390                   │
     │                                                                   │
     │ Installing glibc-common-2.14.90-14.x86_64 (112 MB)                │
     │ Common binaries and locale data for glibc                         │
     │                                                                   │
     │                                                                   │
     │                                                                   │
     └───────────────────────────────────────────────────────────────────┘

And now, it's stuck like that for ever. Doesn't budge, trying to install pkgs for eternity. Let's try to see what's the state of the guest in a seperate terminal


[root@regular-guest ~]# virsh list
 Id Name                 State
----------------------------------
  1 nested-guest         paused

[root@regular-guest ~]# 
[root@regular-guest ~]#  virsh domstate nested-guest --reason
paused (unknown)

[root@regular-guest ~]# 

So our nested-guest seems to be paused, And package install on the nested-guest's serial console is still hung. I gave up at this point. Need to try if I can get any helpful info w/ virt-dmesg tool aor any other ways to debug this further.

Just to note, there is enough disk space and memory on the 'regular-guest', so that case is ruled out here. And, I tried to destroy the broken nested-guest, and attempted to create a fresh one(repeated twice). Still no dice.

So not much luck yet with Intel arch, I'd have to try on an AMD machine.

UPDATE(on Intel arch): After trying a couple of times, I was finally able to ssh to the nested guest, but, after a reboot, the nested-guest loses the IP rendering it inaccessible.(Info: the regular-guest has a bridged IP, and nested-guest has a NATed IP) . And I couldn't login via serial-console, as it's broken due to a regression(which has a workaround). Also, refer to comments below for further discussion on NATed networking caveats.
UPDATE2: The correct syntax to be added to /etc/modprobe.conf/dist.conf is options kvm-intel nested=y

14 Comments

Filed under Uncategorized

Little more disk I/O perf. improvement with ‘fallocate’ing a qcow2 disk

Recently I’ve started using ‘preallocation=metadata’ flag while creating qcow2 disk images to extract some decent I/O performance. Today, while discussing qcow2 disk image performance with Stefan Hajnoczi (thank you!) on irc, I found, using fallocate — which preallocates all the blocks to a file — on a qcow2 disk image would improve disk I/O performance a little more as alls the blocks are allocated to the file ahead of time. (Just to note – fallocate comes w/ the linux standard pkg ‘util-linux-ng’)

Let’s run a quick test to see the disk I/O performance improvement by preallocating all the space in a qcow2 disk.

Create the disk image with ‘preallocation=metadata’

 
$ qemu-img create -f qcow2 -o preallocation=metadata /export/vmimgs/f16-test1.qcow2 8G
Formatting '/export/vmimgs/f16-test1.qcow2', fmt=qcow2 size=8589934592 encryption=off cluster_size=65536 preallocation='metadata' 
 

Let's check the size of the image in bytes


$ ls -l /export/vmimgs/f16-test1.qcow2
-rw-r--r--. 1 root root 8591507456 Dec  2 16:55 /export/vmimgs/f16-test1.qcow2

# Also, print the allocated file size in blocks
$ ls -lash /export/vmimgs/f16-test1.qcow2
1.4M -rw-r--r--. 1 root root 8.1G Dec  2 16:55 /export/vmimgs/f16-test1.qcow2
 

Run fallocate to preallocate space to the disk image:


$ fallocate -l 8591507456 /export/vmimgs/f16-test1.qcow2 
 

Now, re-run 'ls' to print the allocated file size in blocks. (Notice that all the disk size, 8G, is now allocated.)


$ ls -lash /export/vmimgs/f16-test1.qcow2
8.1G -rw-r--r--. 1 root root 8.1G Dec  2 16:55 /export/vmimgs/f16-test1.qcow2
$ 
 

Also, let's run 'qemu-img info' to get the disk size, virtual size.


$ qemu-img info f16-test1.qcow2 
image: f16-test1.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 8.0G
cluster_size: 65536
$ 
 

As a simple test, I used the above disk image to create an @core only Fedora-16 guest(on a Fedora-16 host) and clocked the timing -- it took roughly 5 min 32 sec to finish. While, previously, w/o fallocateing a disk image, when I clocked the same f-16 timing, it took nearly 8 minutes. So, there is a decent improvement noticed here.

With this, Stefan noted, disk write speed inside the guest machine should also be improved, when blocks are written for the first time. And also, due to less disk fragmentation -- as all the space was preallocated in one operation -- there would be fewer disk seeks during large read operations.

5 Comments

Filed under Uncategorized

Creating a Qcow2 virtual machine

Qcow2 disk image is an interesting format which supports features like internal and external snapshots, backing files, image compression, encryption. But also, it’s I/O performance is very slow compared to RAW format. Here are a couple of settings which can extract reasonable performance out of Qcow2 disk images.

Create a qcow2 disk image
First, let’s create a qcow2 disk image using ‘qemu-img’ tool

$ /usr/bin/qemu-img create -f qcow2 -o preallocation=metadata /export/vmimgs/glacier.qcow2 8G

NOTE: At this point in time, preallocation=metadata option is the best we can do to extract max. possible (near RAW) I/O performance out of QCOW2 format. (hint from Kevin Wolf - Qemu/Qcow2 developer )

From the below listing that 970M is the allocated or used size of the guest, 8.1G is the max size the image can 'grow to'.


[root@moon tmp]# ls -lash /export/vmimgs/glacier.qcow2 
970M -rw-r--r--. 1 qemu qemu 8.1G Sep 24 23:45 /export/vmimgs/glacier.qcow2
[root@moon tmp]# 

Create the guest

# Create an unattended minimal guest install using a qcow2 disk image
virt-install --connect=qemu:///system \
    --network=bridge:br0 \
    --initrd-inject=/var/tmp/fed-minimal.ks \
    --extra-args="ks=file:/fed-minimal.ks console=tty0 console=ttyS0,115200" \
    --name=glacier \
    --disk path=/export/vmimgs/glacier.qcow2,format=qcow2 \
    --ram 2048 \
    --vcpus=2 \
    --check-cpu \
    --hvm \
    --location=http://download.fedora.redhat.com/pub/fedora/linux/releases/15/Fedora/x86_64/os/ \
    --nographics 

The above will create a minimal guest w/ a qcow2 disk image format. Content of the fed-minimal kickstart is here

Once, the guest is created, ensure to have cache='none' parameter in 'disk' element of the guest's xml file (if not present, add it and redefine the xml. It looks like below). This is another aspect which can improve the disk I/O performance.


[root@moon ~]# grep cache /etc/libvirt/qemu/glacier.xml -A 4 -B 1
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/export/vmimgs/glacier.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
[root@moon ~]# virsh define /etc/libvirt/qemu/glacier.xml
Domain glacier defined from /etc/libvirt/qemu/glacier.xml
[root@moon ~]# virsh start glacier
Domain glacier started

[root@moon ~]#

I'm still trying to wrap my head around the caching and preallocation mechanisms of the qcow2 format. Meanwhile, work on Qcow2 version-3 is in progress in upstream qemu.

4 Comments

Filed under Uncategorized