• 4 min read
  • I have had a 1U server co-located for some time now at iWeb Technologies' datacenter in Montreal. So far I've had no issues and it did a wonderful job hosting websites & a few other VMs, but because of my concern for its aging hardware I wanted to migrate away before disaster struck.

    Modern VPS offerings are a steal in terms of they performance they offer for the price, and Linode's 4096 plan caught my eye at a nice sweet spot. Backed by powerful CPUs and SSD storage, their VPS is blazingly fast and the only downside is I would lose some RAM and HDD-backed storage compared to my 1U server. The bandwidth provided wit the Linode was also a nice bump up from my previous 10Mbps, 500GB/mo traffic limit.

    When CentOS 7 was released I took the opportunity to immediately start modernizing my CentOS 5 configuration and test its configuration. I wanted to ensure full continuity for client-facing services - other than a nice speed boost, I wanted clients to take no manual action on their end to reconfigure their devices or domains.

    I also wanted to ensure zero downtime. As the DNS A records are being migrated, I didn't want emails coming in to the wrong server (or clients checking a stale inboxes until they started seeing the new mailserver IP). I can easily configure Postfix to relay all incoming mail on the CentOS 5 server to the IP of the CentOS 7 one to avoid any loss of emails, but there's still the issue that some end users might connect to the old server and get served their old IMAP inbox for some time.

    So first things first, after developing a prototype VM that offered the same service set I went about buying a small Linode for a month to test the configuration some of my existing user data from my CentOS 5 server. MySQL was sufficiently easy to migrate over and Dovecot was able to preserve all UUIDs, so my inbox continued to sync seamlessly. Apache complained a bit when importing my virtual host configurations due to the new 2.4 syntax, but nothing a few sed commands couldn't fix. So with full continuity out of the way, I had to develop a strategy to handle zero downtime.

    With some foresight and DNS TTL adjustments, we can get near zero downtime assuming all resolvers comply with your TTL. Simply set your TTL to 300 (5 minutes) a day or so before the migration occurs and as your old TTL expires, resolvers will see the new TTL and will not cache the IP for as long. Even with a short TTL, that's still up to 5 minutes of downtime and clients often do bad things... The IP might still be cached (e.g. at the ISP, router, OS, or browser) for longer. Ultimately, I'm the one that ends up looking bad in that scenario even though I have done what I can on the server side and have no ability to fix the broken clients.

    To work around this, I discovered an incredibly handy tool socat that can make magic happen. socat routes data between sockets, network connections, files, pipes, you name it. Installing it is as easy as: yum install socat

    A quick script later and we can forward all connections from the old host to the new host:


    # Stop services on this host
    for SERVICE in dovecot postfix httpd mysqld;do
      /sbin/service $SERVICE stop

    # Some cleanup
    rm /var/lib/mysql/mysql.sock

    # Map the new server's MySQL to localhost:3307
    # Assumes capability for password-less (e.g. pubkey) login
    ssh $NEWIP -L 3307:localhost:3306 &
    socat unix-listen:/var/lib/mysql/mysql.sock,fork,reuseaddr,unlink-early,unlink-close,user=mysql,group=mysql,mode=777 TCP:localhost:3307 &

    # Map ports from each service to the new host
    for PORT in 110 995 143 993 25 465 587 80 3306;do
      echo "Starting socat on port $PORT..."
      socat TCP-LISTEN:$PORT,fork TCP:${NEWIP}:${PORT} &
      sleep 1

    And just like that, every connection made to the old server is immediately forwarded to the new one. This includes the MySQL socket (which is automatically used instead of a TCP connection a host of 'localhost' is passed to MySQL).

    Note how we establish a SSH tunnel mapping a connection to localhost:3306 on the new server to port 3307 on the old one instead of simply forwarding the connection and socket to the new server - this is done so that if you have users who are permitted on 'localhost' only, they can still connect (forwarding the connection will deny access due to a connection from a unauthorized remote host).

    Update: a friend has pointed out this video to me, if you thought 0 downtime was bad enough... These guys move a live server 7km through public transport without losing power or network!

  • 1 min read
  • Alex gave a very interesting talk at KVM Forum 2014 about the current state of VGA passthrough using KVM & VFIO:

    Also, I think nVidia is making an incredibly silly choice (apparently accidentally) causing Code 43 in their drivers when virtualization is detected and refusing to fix the bugs. Virtualization is becoming evermore powerful and this is just going to push potential customers away to AMD. Once they establish a reputation for their cards not working well with virtualization, they're going to have trouble gaining custom confidence even if they reverse their stance on not fixing the Code 43 bugs.
  • 14 min read
  • This how-to will show you how to configure:

    • Install and configure the KVM hypervisor
    • Patch the kernel and QEMU for better compatibility with graphics card / VGA VFIO passthrough
    • Create and configure a new virtual machine (VM) with real hardware attached to it
    • Configure CPU pinning on the VM for better gaming performance

    Build considerations & preparation

    QEMU has several PCI passthrough techniques, the newest of which is VFIO. QEMU's normal PCI passthrough leaves much to be desired whereas VFIO takes full advantage of IOMMU, has better device support and prevents multiple access to the same device (you can read more about it in Alex Williamson's presentation here).

    That said, VFIO it is relatively new and experimental technology for the purposes of passing through entire VGA cards to virtual machines. While myself and many others have had tremendous success, different hardware can produce different results and getting there may not always be straightforward.

    Likely, until Fedora 21 is released you will need to patch and rebuild both the Linux kernel and QEMU; instructions for doing so will be provided in this tutorial. If you are purchasing hardware, it is also strongly recommended that you read over the KVM VGA-passthrough thread on Arch Forums to confirm that your intended hardware configuration has been reported to work by another user.

    Personally, the following hardware has worked wonderfully for me:

    • Motherboard: Supermicro X10SAE
    • CPU: Xeon E3-1225v3
    • Audio: Onboard (Intel C220 HD audio) and AMD R9 270X HDMI
    • Video: AMD Radeon R9 270X
    • Network: Intel I210 Gigabit

    Anecdotal evidence suggests that for graphics passthrough, nVidia cards seem to fare better than AMD ones. However, success has been had on both sides on a variety of device models dating back several years. I have found that generally, problems are not inherent to the hardware but more a matter of adjusting you software stack (i.e. applying certain patches) to get a compatible passthrough. From my reading nVidia's GeForce 6xx/7xx and AMD's Radeon R9 series seem to work fairly painlessly.

    For network cards, always pass through an Intel ethernet controller over a Realtek one if possible.

    Common problems with VGA VFIO passthrough

    Before getting to the fun part, there are several key pieces to getting a functioning VGA passthrough that need further description. Understanding these issues will be key in creating a functioning host environment for VFIO VGA passthrough.

    PCIe device reset

    In order to re-initialize a PCIe device, it needs to be reset. Normally the host controls this, however now that we are passing through the device to a VM, some additional work is required to get reset functioning correctly. Without this extra PCIe reset support, the machine typically freezes when starting your VM for a second time.

    Fortunately for us, kernel >= 3.12 has this support and simply upgrading the kernel fixes the issue.

    VGA arbiter and multiple GPUs

    Passing through generic PCI devices with VFIO works pretty well. Graphics card passthrough gets put into its own category called "VGA passthrough" because of the technical challenges involved in presenting a functioning GPU for the virtual machine to initialize without things going awry.

    Most computers today come with a GPU built-in to the CPU. This will be a major headache when trying to setup VGA passthrough, as VGA is a really old standard. Back when it was created, having multiple graphics cards on a single system was not a configuration they had foreseen or designed for. VGA calls can only be directed to a single device at a time, so the kernel has to use a VGA arbiter that switches the active device and directs VGA accesses to the appropriate card. I am oversimplifying this a bit, but Alex Williamson has a detailed post explaining the technical issues. In short, the x-vga=on flag passed to VFIO indicates to the VGA arbiter that the VFIO driver will need to participate in VGA arbitration, so everyone stays happy.

    The problem is that the Xorg i915 driver for Intel's integrated GPUs does not participate in VGA arbitration, even though the devices claim the VGA address space. This means VGA calls get directed to the wrong card, (a) messing up your display on the host and (b) preventing the graphics card on the VM from functioning correctly. Ugh.

    Fortunately, Alex has also written kernel patches to fix this, however be warned that they cripple 3D performance on the Intel GPU. Since we're building a high-performance VM for gaming, I am assuming that will not be an issue for you.

    Kernel patches:


    NoSnoop is a feature flag on a PCIe device that allows it to issue transitions that to bypass cache. This can cause consistency problems when passing through the card to a virtual machine.

    You can check if you card has NoSnoop enabled by running lspci -vvvv as root and seeing if your graphics card lists NoSnoop+ (enabled) or NoSnoop- (disabled) under the Capabilities > DevCtl section.

    Previously this required patches to the kernel, but with kernel 3.15.5 in Fedora, these patches are no longer required.

    Rebuilding packages

    The first step is to setup a packaging environment and download the upstream source RPMS:

    yum install fedora-packager yum-utils
    cd ~/rpmbuild/SRPMS
    yumdownloader --source kernel
    rpm -i kernel*.src.rpm

    It should have downloaded kernel-[version].fc20.src.rpm in your directory.

    Download any of the patches listed above that may be required for your hardware configuration (as plaintext patch/diff files) and save them to ~/rpmbuild/SOURCES.

    Rebuilding QEMU

    Update 2014-06-09: the newest versions of QEMU in virt-preview (>= 2.0.0) have the NoSnoop patches included! No rebuilding necessary. If you previously followed this guide, remove any QEMU exclusions from yum.conf and update to the latest available version.

    Rebuilding Kernel

    Download any required patches and save them as plaintext files in your ~/rpmbuild/SOURCES folder. Next, add your PatchXYZ: filename.patch lines where you see other patches being declared. For example:
    Next, edit ~/rpmbuild/SPECS/kernel.spec and find the lines where existing patches are listed. You should see something like this:

    # patches headed upstream
    Patch12016: disable-i8042-check-on-apple-mac.patch

    Patch14000: hibernate-freeze-filesystems.patch

    Patch14010: lis3-improve-handling-of-null-rate.patch

    Patch15000: nowatchdog-on-virt.patch

    Add additional lines corresponding to your patchsets. The result could be something like:

    # patches headed upstream
    Patch12016: disable-i8042-check-on-apple-mac.patch

    Patch14000: hibernate-freeze-filesystems.patch

    Patch14010: lis3-improve-handling-of-null-rate.patch

    Patch15000: nowatchdog-on-virt.patch

    Patch16000: aw-i915-v3-add-vga-arbiter-module-option.patch
    Patch16001: aw-vgaarb-non-decoded-resources.patch

    I added in the Patch16000 and Patch16000, specifying the filenames of the patch files I saved into my SOURCES folder.

    Now, scroll down to the %prep section where you see other patches being applied. That would look something like:

    #rhbz 1051668
    ApplyPatch Input-elantech-add-support-for-newer-elantech-touchpads.patch

    # CVE-2014-3917 rhbz 1102571 1102715
    ApplyPatch auditsc-audit_krule-mask-accesses-need-bounds-checking.patch


    Above the line # END OF PATCH APPLICATIONS, add lines for your patch sets, for example:

    # CVE-2014-3917 rhbz 1102571 1102715
    ApplyPatch auditsc-audit_krule-mask-accesses-need-bounds-checking.patch

    ApplyPatch aw-i915-v3-add-vga-arbiter-module-option.patch
    ApplyPatch aw-vgaarb-non-decoded-resources.patch


    Now rebuild the kernel with our patches and install it:

    rpmbuild -ba ~/rpmbuild/SPECS/kernel.spec --without=perf --without=tools --without=debug --without=debuginfo

    It may list some build dependencies. If so, install them with yum install foo and then run the rpmbuild command again. When the build is complete, it output a list of filenames that you can install. Here's a quick command to do so:

    yum reinstall ~/rpmbuild/RPMS/$(uname -m)/kernel-{,headers,devel}*.rpm

    Reboot and your system will be fully patched! I suggest you add a line exclude=kernel* to /etc/yum.conf to prevent for patched packages from being upgraded.

    Installing KVM

    Because this is all experimental stuff, install fedora-virt-preview to get the latest and greatest software set:

    wget http://fedorapeople.org/groups/virt/virt-preview/fedora-virt-preview.repo -O /etc/yum.repos.d/fedora-virt-preview.repo
    yum install @virtualization

    Next, let's be nice to the VMs and give them some time to perform a graceful shutdown before the host powers off:

    sed -i 's/#ON_SHUTDOWN=.*/ON_SHUTDOWN=shutdown/' /etc/sysconfig/libvirt-guests
    systemctl enable libvirt-guests
    systemctl enable libvirtd

    Edit the default kernel boot arguments (specified in GRUB_CMDLINE_LINUX of /etc/default/grub) and add intel_iommu=on for Intel CPUs or iommu=pt iommu=1 for AMD CPUs to turn on IOMMU functionality.

    As well, the host initializes certain devices on init (e.g. graphic cards, USB controller, audio chipsets) so these devices need to be manually assigned to the PCI stub driver to prevent the host from using the device during host boot. It allows the VFIO driver to later bind to the devices and pass them to a VM. Add pci-stub.ids=PCI_IDs where PCI_IDs is a comma-separated list of PCI IDs as given by lspci -nn. For example, my GRUB_CMDLINE_LINUX looks like:

    GRUB_CMDLINE_LINUX="vconsole.font=latarcyrheb-sun16 rd.lvm.lv=fedora/root $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) rhgb quiet pci-stub.ids=1002:6810,1002:aab0,8086:8c20,8086:153a,8086:8c31 intel_iommu=on systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M enforcing=0"

    With the configuration changes in place, regenerate the GRUB configuration:

    grub2-mkconfig -o /boot/grub2/grub.cfg

    Lastly, we will now need to have VFIO bind to any device you want passed to a VM. This will include devices stubbed on boot and possibly others:

    cat << EOF > /etc/sysconfig/vfio-bind

    cat << EOF > /usr/local/bin/vfio-bind
    modprobe vfio-pci
    for dev in "\$@"; do
            vendor=\$(cat /sys/bus/pci/devices/\$dev/vendor)
            device=\$(cat /sys/bus/pci/devices/\$dev/device)
            if [ -e /sys/bus/pci/devices/\$dev/driver ]; then
                    echo \$dev > /sys/bus/pci/devices/\$dev/driver/unbind
            echo \$vendor \$device > /sys/bus/pci/drivers/vfio-pci/new_id
    chmod +x /usr/local/bin/vfio-bind

    cat << EOF > /etc/systemd/system/vfio-bind.service
    Description=Binds devices to vfio-pci

    ExecStart=-/usr/local/bin/vfio-bind \$DEVICES

    systemctl enable vfio-bind.service
    systemctl start vfio-bind.service

    The system will now automatically attempt to bind to the devices indicated in /etc/sysconfig/vfio-bind to VFIO at bootup. The format of FULL_PCI_IDs is a little different than earlier, as it is space separated and requires a full bus address prefix as per ls /sys/bus/pci/devices. You can use lspci -nn to identify a device, and then the output from the file listing to identify its full prefix. Here's an example of my configuration:

    DEVICES="0000:01:00.0 0000:01:00.1 0000:00:1b.0 0000:00:19.0 0000:06:00.0 0000:00:14.0"

    Because QEMU normally runs sandboxed, we need to 'unhinge' it and give it root privileges so it can control hardware. In /etc/libvirt/qemu.conf, add:

    user = "root"
    group = "root"
    clear_emulator_capabilities = 0

    As well, we need to provide QEMU with access to the VFIO device files. List all available VFIO groups like this:
    ls -1 /dev/vfio

    For ever number that appears, ensure its full path appears in the cgroup_device_acl configuration parameter. For example, mine looks like so:

    cgroup_device_acl = [
        "/dev/null", "/dev/full", "/dev/zero",
        "/dev/random", "/dev/urandom",
        "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
        "/dev/rtc","/dev/hpet", "/dev/vfio/vfio",
        "/dev/vfio/1", "/dev/vfio/4", "/dev/vfio/5",
        "/dev/vfio/6", "/dev/vfio/7", "/dev/vfio/8",

    That's it! The last step is to actually create your virtual machine. The following snippet creates a sample reference file in your home directory:

    cat << EOF > ~/gaming-vm-sample.xml






















    Thanks to kaeptnb on Arch Linux forums for providing the libvirt-xml file structure the above is based off of.

    Edit the variables YOUR_VM_NAME, MEMORY_KB, NUM_CPUS (physical CPUs), NUM_CORES (cores per CPU), NUM_THEADS (threads per core; 1=normal and 2=each virtual core gets a corresponding HyperThreading CPU) to your liking.

    On my host, I have a 4 core CPU with hyperthreading (8 logical cores) so assigning the virtual machine 1 CPU, 3 cores and 2 threads (resulting in 6 logical virtual CPUs visible, and reserving 1 physical core + 1 HT core for the host) has worked very well.

    The section where the variables DEV_PARTITION_PATH and PATH_TO_LOCAL_FILE appear can be used in conjunction or one at a time, depending on your configuration. A plain image file on your disk can be created with qemu-img or just point it to an unformatted partition (preferred for better performance - it can be a physical partition or a logical RAID/LVM one).

    As well, be sure to specify the correct GPU_PCI_ID and other devices IDs for your setup. If you only want to pass through a GPU, then remove the -device and vfio-pci parameters for the other devices.

    When you have fully customized the file, import it:

    virsh define ~/gaming-vm-sample.xml

    You may wish to open virt-manager and copy the host CPU features if you're not running a Haswell CPU. Another tip is for Windows, you will need to download the latest VirtIO driver image and attach it to the machine in order for Windows to detect a disk. Use the Have Disk... option and Browse to the WIN7/AMD64 folder during installation and install the device drivers listed there.

    Now set the VM auto-start on boot and get gaming!:

    virsh autostart YOUR_VM_NAME


    If your VM isn't booting, try editing the VM configuration (virsh edit YOUR_VM_NAME) and on the line where you input GPU_PCI_ID, change bus=pcie.0 to bus=root.1,addr=00.1. This exposes the graphics card on a different PCIe port in the VM which may sometimes help.

    For nVidia users, recent driver packages apparently are broken without the kvm=off flag to the QEMU's -cpu parameter. Apparently, nVidia checks for the KVM hypervisor's signature and disables their driver if it detects it. It is not clear if this was an intentional change or not, but this is the reality of it.

    Be sure to read through the KVM thread on Arch Linux forums that's been linked throughout this howto, as it contains tons of valuable (albeit sparse) information. Another tip would be to get in touch with the fedora-virt mailing list and describe the issue.


  • 2 min read
  • Cyberduck recently removed a particularly useful piece of information from their wiki regarding the sharing of bookmarks because it is no longer compatible with the sandboxed variant of Cyberduck available from the App Store. It is, however, still compatible with the Windows and OS X download available directly from its website.

    To setup bookmark sharing between Cyberduck clients (works with both OS X or Windows), simply create a folder in your cloud sync folder and then point Cyberduck to it.

    On OS X, open a Terminal and execute:

    defaults write ch.sudo.cyberduck application.support.path ~/Dropbox/Cyberduck

    On Windows, press Super+R (Super is the key with the Windows logo on it) to open the "Run" dialog, and enter %APPDATA%. Next, open the Cyberduck.exe_Url_[some_garble]\[Version]\user.config file and modify the config file to add the new parameter:



    edit 2021-04-30: Cyberduck has since upgraded to use MacOS's app containerization feature, so this hidden preference no longer exists. However, you can still sync bookmarks - you just need to symlink the "Bookmarks" folder that now lives under the app container path (see the Cyberduck FAQ) to an external directory, e.g. one in your Dropbox or Google Drive:

    mv ~/Library/Group\ Containers/G69SCX94XU.duck/Library/Application\ Support/duck/Bookmarks ~/Dropbox/Cyberduck-Bookmarks
    ln -s ~/Dropbox/Cyberduck-Bookmarks ~/Library/Group\ Containers/G69SCX94XU.duck/Library/Application\ Support/duck/Bookmarks
  • 3 min read
  • This how-to will show you how to configure:

    • A MyDNS name server
    • A database to hold the DNS record information

    Before starting

    Please ensure that you have followed the instructions in the getting started guide here.

    If you have not setup the database server yet, please follow the database how-to first.

    Installing MyDNS

    yum install mydns mydns-mysql
    chkconfig mydns on
    iptables -I RH-Firewall-1-INPUT 4 -p udp -m udp --sport 53 --dport 1024:65535 -j ACCEPT
    iptables -I RH-Firewall-1-INPUT 4 -p udp -m udp --dport 53 -j ACCEPT
    service iptables save

    Setting up the database

    MyDNS uses MySQL as its backend to store record information, so it needs a database setup before it can be configured. Start by opening a root MySQL session:

    mysql -u root -p

    Enter your MySQL root user's password and type at the mysql> prompt:

    GRANT SELECT ON mydns.* TO 'mydns'@'localhost' IDENTIFIED BY 'mydns_password';

    Replace new_password with a secure password. It will be used to grant MyDNS read-only access to the record database; this ensures that no exploits can result in write access to the record store (it is recommend that you setup another MySQL user for scripted write access to the database).

    Next, import the default database:

    mydns --create-tables | mysql -u root -p mydns

    The last step is to adjust the MyDNS configuration file to use the newly database user credentials:

    sed -i.bak -e 's/db-user = username/db-user = mydns/' /etc/mydns.conf
    sed -i.bak -e 's/db-password = password/db-password = mydns_password/' /etc/mydns.conf

    As before, replace mydns_password with your selected MySQL user password for MyDNS.

    Start the service

    MyDNS is now fully configured and ready to run. The service can be started:

    service mydns start

    Administering the server

    MyDNS will now serve records zones from the rr with records from the soa table. The daemon does not have to be restarted for changes to be recognized, so you can take advantage of this by using scripts to update your MyDNS database on-the-fly. Zone replication via SQL backups is another particularly handy side-effect of this feature.

    As an example, included below is a small script I use to add new domains my servers:

    # Usage: add_dns_domain mysite.tld [mysite2.tld ...]
    TIME="$(date +'%s')"
    TMPFILE="$(mktemp)" || exit 1

    # Set this to your primary and secondary nameservers

    # Set this to your primary email, with the @ replaced by a single dot.

    # Default shared IP to point domains to

    for domain in "$@";do
      cat << EOF >> $TMPFILE
    INSERT INTO mydns.soa (origin,ns,mbox,serial,refresh,retry,expire,minimum,ttl) VALUES('${domain}.', '${NS1}.', '${EMAIL}.', $TIME, 10800, 3600, 604800, 14400, 14400);
    INSERT INTO mydns.rr (zone,name,data,aux,ttl,type) VALUES(LAST_INSERT_ID(), '${domain}.', '${NS1}.', 0, 14400, 'NS'),
                                                           (LAST_INSERT_ID(), '${domain}.', '$NS2.', 0, 14400, 'NS'),
                                                           (LAST_INSERT_ID(), '${domain}.', '${domain}.', 0, 14400, 'MX'),
                                                           (LAST_INSERT_ID(), '${domain}.', '${SHAREDIP}', 0, 14400, 'A'),
                                                           (LAST_INSERT_ID(), 'mail', '${domain}.', 0, 14400, 'CNAME'),
                                                           (LAST_INSERT_ID(), 'www', '${domain}.', 0, 14400, 'CNAME');
    mysql -u root -p < $TMPFILE

    # you can do some other stuff here with TMPFILE if you want

    # cleanup
    rm $TMPFILE

    As you can see above, it adds a zone for each domain and then sets up default CNAME aliases for www and mail to point to the main domain. The main domain gets pointed at the default shared IP using an A record.