Live disk migration with libvirt blockcopy

QEMU and libvirt projects has had a lot of block layer improvements in its last few releases (libvirt 1.2.6 & QEMU 2.1). This post discusses a method to do live disk storage migration with libvirt’s blockcopy.

Context on libvirt blockcopy
Simply put, blockcopy facilitates virtual machine live disk image copying (or mirroring) — primarily useful for different use cases of storage migration:

  • Live disk storage migration
  • Live backup of a disk image and its associated backing chain
  • Efficient non-shared storage migration (with a combination of virsh operations snapshort-create-as+blockcopy+blockcommit)
  • As of IceHouse release, OpenStack Nova project also uses a variation of libvirt blockcopy, through its Python API virDomainBlockRebase, to create live snapshots, nova image-create. (More details on this in an upcoming blog post).

A blockcopy operation has two phases: (a) All of source disk content is copied (or mirrored) to the destination, this operation can be canceled to revert to the source disk (b) Once libvirt gets a signal indicating source and destination content are equal, the mirroring job remains awake until an explicit call to virsh blockjob [. . .] --abort is issued to end the mirroring operation gracefully . If desired, this explicit call to abort can be avoided by supplying --finish option. virsh manual page for verbose details.

Scenario: Live disk storage migration

To illustrate a simple case of live disk storage migration, we’ll use a disk image chain of depth 2:

base <-- snap1 <-- snap2 (Live QEMU) 

Once live blockcopy is complete, the resulting status of disk image chain ends up as below:

base <-- snap1 <-- snap2
          ^
          |
          '------- copy (Live QEMU, pivoted)

I.e. once the operation finishes, ‘copy’ will share the backing file chain of ‘snap1′ and ‘base’. And, live QEMU is now pivoted to use the ‘copy’.

Prepare disk images, backing chain & define the libvirt guest

[For simplicity, all virtual machine disks are QCOW2 images.]

Create the base image:

 $ qemu-img create -f qcow2 base 1G

Edit the base disk image using guestfish, create a partition, make a file-system, add a file to the base image so that we distinguish its contents from its qcow2 overlay disk images:

$ guestfish -a base.qcow2 
[. . .]
><fs> run 
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /foo
><fs> ls /
foo
><fs> exit

Create another QCOW2 overlay snapshot ‘snap1′, with backing file as ‘base’:

$ qemu-img create -f qcow2 -b base.qcow2 \
  -o backing_fmt=qcow2 snap1.qcow2

Add a file to snap1.qcow2:

$ guestfish -a snap1.qcow2 
[. . .]
><fs> run
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /bar
><fs> ls /
bar
baz
foo
lost+found
><fs> exit

Create another QCOW2 overlay snapshot ‘snap2′, with backing file as ‘snap1′:

$ qemu-img create -f qcow2 -b snap1.qcow2 \
  -o backing_fmt=qcow2 snap2.qcow2

Add another test file ‘baz’ into snap2.qcow2 using guestfish (refer to previous examples above) to distinguish contents of base, snap1 and snap2.

Create a simple libvirt XML file as below, with source file pointing to snap2.qcow2 — which will be the active block device (i.e. it tracks all new guest writes):

$ cat <<EOF > /etc/libvirt/qemu/testvm.xml
<domain type='kvm'>
  <name>testvm</name>
  <memory unit='MiB'>512</memory>   
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>hvm</type>
  </os>
  <devices>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/export/vmimages/snap2.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>   
  </devices>
</domain>
EOF

Define the guest and start it:

$ virsh define etc/libvirt/qemu/testvm.xml
  Domain testvm defined from /etc/libvirt/qemu/testvm.xml
$ virsh start testvm
Domain testvm started

Perform live disk migration
Undefine the running libvirt guest to make it transient[*]:

$ virsh dumpxml --inactive testvm > /var/tmp/testvm.xml

Check what is the current block device before performing live disk migration:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/snap2.qcow2

Optionally, display the backing chain of snap2.qcow2:

$ qemu-img info --backing-chain /export/vmimages/snap2.qcow2
[. . .] # Output removed for brevity

Initiate blockcopy (live disk mirroring):

$ virsh blockcopy --domain testvm vda \
  /export/blockcopy-test/backups/copy.qcow2 \
  --wait --verbose --shallow \
  --pivot --finish

Details of the above command: It creates copy.qcow2 file in the specified path; performs a --shallow blockcopy (i.e. the ‘copy’ shares the backing chain) of the current block device (vda); pivot the live QEMU to the ‘copy’ and end the operation gracefully by issuing a --finish.

Confirm that QEMU has pivoted to the ‘copy’ by enumerating the current block device in use:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/copy.qcow2

Again, display the backing chain of ‘copy’, it should be the resultant chain as noted in the Scenario section above).

$ qemu-img info --backing-chain /export/vmimages/copy.qcow2

Enumerate the contents of copy.qcow2:

$ guestfish -a copy.qcow2 
[. . .]
><fs> run
><fs> mount /dev/sda1 /
><fs> ls /
bar
foo
baz
lost+found
><fs> quit

(You can notice above: all the content from base.qcow2, snap1.qcow2, and snap2.qcow2 mirrored into copy.qcow2.)

Define the libvirt guest again to restore the persistence[*]:

$ virsh define /var/tmp/testvm.xml

[*] Reason for the undefining and defining the guest again: As of writing this, QEMU has to support persistent dirty bitmap — this enables us to restart a QEMU process with disk mirroring intact. There are some in-progress patches upstream for a while. Until they are in main line QEMU, the current approach (as illustrated above) is: make a running libvirt guest transient temporarily, perform live blockcopy, and make the guest persistent again. (Thanks to Eric Blake, one of libvirt project’s principal developers, for this detail.)

Leave a comment

Filed under Uncategorized

On bug reporting. . .

[This has been sitting my drafts for a while, time to hit send. Long time ago I wrote something like this to a Red Hat internal mailing list, thought I'll put it out here too.]

A week ago, I was participating in one of the regular upstream OpenStack bug triage sessions (for Nova project) to keep up with the insane flow of bugs. All open source projects are grateful for reported bugs. But more often than not, a good chunk of bugs in such large projects end up being bereft of details.

Writing a good bug report is hard. Writing a coherent, reproducible bug report is much harder (and takes more time). Especially when involved in complex cloud projects like OpentStack where you may have to care about everything from Kernel to user space.

A few things to consider including in a bug report. The following are all dead obvious points (and are usually part of a high-level template in issue trackers like Bugzilla). That said, however many times you repeat, it’s never enough:

  • Summary. A concise summary — is it a bug? an RFE? a tracker-bug?
  • Description. A clear description of the issue and its symptoms in chronological order.
  • Version details. e.g. Havana? IceHouse?, hypervisor details; API versions, anything else that’s relevant.
  • Crystal clear details to reproduce the bug. Bonus points for a reproducer script.
  • Test environment details. (This is crucial.)
    • Most of “cloud” software testing is dearly dependent on test environment. Clearer the details, fewer the round-trips between developers, test engineers, packagers, triagers, etc.
    • Note down if any special hardware is needed for testing, e.g. an exotic NAS, etc.
    • If you altered anything — config files, especially code located in /usr/lib/pythonx.x/site-packages/, or any other dependent lower-layer project configurations (e.g. libvirt’s config file), please say so.
  • Actual results. Post the exact, unedited details of what happens when the bug in question is triggered.
  • Expected results. Describe what is the desired behavior.
  • Additional investigative details (where appropriate).
    • If you’ve done a lot of digging into an issue, writing a detailed summary (even better: a blog post) while its fresh in memory is very useful. Along with addition info like – configuration settings, caveats, relevant log fragment, _stderr_ of a script, or a command being executed, adding trace details — all of which would be useful for archival purposes (and years later, the context would come in very handy)

Why?

  • Useful for new test engineers who do not have all the context of a bug.
  • Useful for documentation writers to help them write correct errata text/release notes.
  • Useful for non-technical folks reading the bugs/RFEs. Clear information saves a heck of a lot of time.
  • Useful for folks like product and program managers who’re always not in the trenches.
  • Useful for downstream support organizations.
  • Should there be a regression years later, having all the info to test/reproduce in the bug, right there makes your day!
  • Reduces needless round-trips of NEEDINFO.
  • Useful for new users referring to these bugs in a different context.

Overall, a very fine bug report reading experience.

You get the drift!

3 Comments

Filed under Uncategorized

Upcoming: OpenStack workshop at Fedora Flock conference, Prague (Aug 6-9, 2014)

UPDATE: I’ll not be able to make it to Prague due to plenty of other things going on. However, Jakub Ružička, an OpenStack developer and RDO project packager kindly agreed (thank you!) to step in to do the talk/workshop.

Fedora Project’s yearly flagship conference Flock is in Prauge (August 6-9, 2014) this time. A proposal I submitted for an OpenStack workshop is accepted. The aim workshop is to do a live setup of a minimal, 2-node OpenStack environment using virtual machines (also involves KVM-based nested virtualization). This workshop also aims to help understand some of the OpenStack Neutron networking components and related debugging techniques.

Slightly more verbose details are in the abstract. An etherpad for capturing notes/planning and remote participation.

Leave a comment

Filed under Uncategorized

Create a minimal Fedora Rawhide guest

Create a Fedora 20 guest from scratch using virt-builder , fully update it, and install a couple of packages:

$ virt-builder fedora-20 -o rawhide.qcow2 --format qcow2 \
  --update --selinux-relabel --size 40G\
  --install "fedora-release-rawhide yum-utils"

Import the disk image into libvirt:

$ virt-install --name rawhide --ram 4096 --disk \
  path=/home/kashyapc/rawhide.qcow2,format=qcow2,cache=none \
  --import

Login via serial console into the guest, upgrade to Rawhide:

$ yum-config-manager --disable fedora updates updates-testing
$ yum-config-manager --enable rawhide
$ yum --releasever=rawhide distro-sync --nogpgcheck
$ reboot

Optionally, create a QCOW2 internal snapshot (live or offline) of the guest:

$ virsh snapshot-create-as rawhide snap1 \
  "Clean Rawhide 19MAR2014"

Here are a couple of methods on how to upgrade to Rawhide

Leave a comment

Filed under Uncategorized

Notes for building KVM-based virtualization components from upstream git

I frequently need to have latest KVM, QEMU, libvirt and libguestfs while testing with OpenStack RDO. I either build from upstream git master branch or from Fedora Rawhide (mostly this suffices). Below I describe the exact sequence I try to build from git. These instructions are available in some form in the README files of the said packages, just noting them here explicitly for convenience. My primary development/test environment is Fedora, but it should be similar on other distributions. (Maybe I should just script it all.)

Build KVM from git

I think it’s worth noting the distinction (from traditional master branch) of these KVM git branches: remotes/origin/queue and remotes/origin/next. queue and next branches are same most of the time with the distinction that KVM queue is the branch where patches are usually tested before moving them to the KVM next branch. And, commits from next branch are submitted (as a PULL request) to Linus during the next Kernel merge window. (I recall this from an old conversation with Gleb Natapov (thank you), one of the previous KVM maintainers on IRC).

# Clone the repo
$ git clone \
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git

# To test out of tree patches,
# it's cleaner to do in a new branch
$ git checkout -b test_branch

# Make a config file
$ make defconfig

# Compile
$ make -j4 && make bzImage && make modules

# Install
$ sudo -i
$ make modules_install && make install

Build QEMU from git

To build QEMU (only x86_64 target) from its git:

# Install buid dependencies of QEMU
$ yum-builddep qemu

# Clone the repo
$ git clone git://git.qemu.org/qemu.git

# Create a build directory to isolate source directory 
# from build directory
$ mkdir -p ~/build/qemu && cd ~/build/qemu

# Run the configure script
$ ~/src/qemu/./configure --target-list=x86_64-softmmu \
  --disable-werror --enable-debug 

# Compile
$ make -j4

I previously discussed about QEMU building here.

Build libvirt from git

To build libvirt from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libvirt

# Clone the libvirt repo
$ git clone git://libvirt.org/libvirt.git && cd libvirt

# Create a build directory to isolate source directory
# from build directory
$ mkdir -p ~/build/libvirt && cd ~/build/libvirt

# Run the autogen script
$ ../src/libvirt/autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libvirt programs without having to install them
$ ./run virsh [. . .]

[Or, prepare RPMs and install them]

# Make RPMs (assumes Fedora `rpmbuild` setup
# is properly configured)
$ make rpm

# Install/update
$ yum update *.rpm

Build libguestfs from git
To build libguestfs from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libguestfs

# Clone the libguestfs repo
$ git clone git://github.com/libguestfs/libguestfs.git \
   && cd libguestfs

# Run the autogen script
$ ./autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libguestfs programs without having to install them
$ ./run guestfish [. . .]

If you’d rather prefer libguestfs to use the custom QEMU built from git (as noted above), QEMU wrappers are useful in this case.

Alternate to building from upstream git, if you’d prefer to build the above components locally from Fedora master here are some instructions .

1 Comment

Filed under Uncategorized

Create a minimal, automated Btrfs Fedora guest template with Oz

I needed to create an automated Btrfs guest for a test. A trivial virt-install based automated script couldn’t complete the guest install (on a Fedora Rawhide host) — it hung perpetually once it tries to retrieve .treeinfo, vmlinuz, initrd.img. On my cursory investigation with guestfish, I couldn’t pin-point the root cause. Filed a bug here

[Edit, 31JAN2014: The above is fixed by adding --console=pty to virt-install command-line; refer the above bug for discussion.]

In case someone wants to reproduce with versions I noted in the above bug:

  $ wget http://kashyapc.fedorapeople.org/virt/create-guest-qcow2-btrfs.bash
  $ chmod +x create-guest-qcow2-btrfs.bash
  $ ./create-guest-qcow2-btrfs.bash fed-btrfs2 f20

Oz
So, I resorted to Oz – a utility to create automated installs of various Linux distributions. I previously wrote about it here

Below I describe a way to use it to create a minimal, automated Btrfs Fedora 20 guest template.

Install it:

 $ yum install oz -y 

A minimal Kickstart file with Btrfs partitioning:

$ cat Fedora20-btrfs.auto
install
text
keyboard us
lang en_US.UTF-8
network --device eth0 --bootproto dhcp
rootpw fedora
firewall --enabled ssh
selinux --enforcing
timezone --utc America/New_York
bootloader --location=mbr --append="console=tty0 console=ttyS0,115200"
zerombr
clearpart --all --drives=vda
autopart --type=btrfs
reboot

%packages
@core
%end

A TDL (Template Description Language) file that Oz needs as input:

$ cat f20.tdl
<template>
  <name>f20btrfs</name>
  <os>
    <name>Fedora</name>
    <version>20</version>
    <arch>x86_64</arch>
    <install type='url'>
      <url>http://dl.fedoraproject.org/pub/fedora/linux/releases/20/Fedora/x86_64/os/</url>
    </install>
    <rootpw>fedora</rootpw>
  </os>
  <description>Fedora 20</description>
</template>

Invoke oz-install (supply the Kickstart & TDL files, turn on debugging to level 4 — all details):

$ oz-install -a Fedora20-btrfs.auto -d 4 f20.tdl 
[. . .]
INFO:oz.Guest.FedoraGuest:Cleaning up after install
Libvirt XML was written to f20btrfsJan_29_2014-11:38:44

Define the above Libvirt guest XML, start it over a serial console:

$  virsh define f20btrfsJan_29_2014-11:38:44
Domain f20btrfs defined from f20btrfsJan_29_2014-11:38:44

$ virsh start f20btrfs --console
[. . .]
Connected to domain f20btrfs
Escape character is ^]

Fedora release 20 (Heisenbug)
Kernel 3.11.10-301.fc20.x86_64 on an x86_64 (ttyS0)

localhost login: 

It’s automated, fine, but still slightly more tedious (TDL file looks redundant at this point) to create custom guest image templates.

Leave a comment

Filed under Uncategorized

Capturing x86 CPU diagnostics

Sometime ago I learnt, from Paolo Bonzini (upstream KVM maintainer), about this little debugging utility – x86info (written by Dave Jones) which captures detailed information about CPU diagnostics — TLB, cache sizes, CPU feature flags, model-specific registers, etc. Take a look at its man page for specifics.

Install:

 $  yum install x86info -y 

Run it and capture the output in a file:

 $  x86info -a 2>&1 | tee stdout-x86info.txt  

As part of debugging KVM-based nested virtualization issues, here I captured x86info of L0 (bare metal, Intel Haswell), L1 (guest hypervisor), L2 (nested guest, running on L1).

Leave a comment

Filed under Uncategorized