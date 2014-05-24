This how-to will show you how to configure:

Install and configure the KVM hypervisor

Patch the kernel and QEMU for better compatibility with graphics card / VGA VFIO passthrough

Create and configure a new virtual machine (VM) with real hardware attached to it

Configure CPU pinning on the VM for better gaming performance

Build considerations & preparation

QEMU has several PCI passthrough techniques, the newest of which is VFIO. QEMU's normal PCI passthrough leaves much to be desired whereas VFIO takes full advantage of IOMMU, has better device support and prevents multiple access to the same device (you can read more about it in Alex Williamson's presentation here).

That said, VFIO it is relatively new and experimental technology for the purposes of passing through entire VGA cards to virtual machines. While myself and many others have had tremendous success, different hardware can produce different results and getting there may not always be straightforward.

Likely, until Fedora 21 is released you will need to patch and rebuild both the Linux kernel and QEMU; instructions for doing so will be provided in this tutorial. If you are purchasing hardware, it is also strongly recommended that you read over the KVM VGA-passthrough thread on Arch Forums to confirm that your intended hardware configuration has been reported to work by another user.

Personally, the following hardware has worked wonderfully for me:

Motherboard: Supermicro X10SAE

CPU: Xeon E3-1225v3

Audio: Onboard (Intel C220 HD audio) and AMD R9 270X HDMI

Video: AMD Radeon R9 270X

Network: Intel I210 Gigabit

Anecdotal evidence suggests that for graphics passthrough, nVidia cards seem to fare better than AMD ones. However, success has been had on both sides on a variety of device models dating back several years. I have found that generally, problems are not inherent to the hardware but more a matter of adjusting you software stack (i.e. applying certain patches) to get a compatible passthrough. From my reading nVidia's GeForce 6xx/7xx and AMD's Radeon R9 series seem to work fairly painlessly.

For network cards, always pass through an Intel ethernet controller over a Realtek one if possible.

Common problems with VGA VFIO passthrough

Before getting to the fun part, there are several key pieces to getting a functioning VGA passthrough that need further description. Understanding these issues will be key in creating a functioning host environment for VFIO VGA passthrough.

PCIe device reset

In order to re-initialize a PCIe device, it needs to be reset. Normally the host controls this, however now that we are passing through the device to a VM, some additional work is required to get reset functioning correctly. Without this extra PCIe reset support, the machine typically freezes when starting your VM for a second time.

Fortunately for us, kernel >= 3.12 has this support and simply upgrading the kernel fixes the issue.

VGA arbiter and multiple GPUs

Passing through generic PCI devices with VFIO works pretty well. Graphics card passthrough gets put into its own category called "VGA passthrough" because of the technical challenges involved in presenting a functioning GPU for the virtual machine to initialize without things going awry.

Most computers today come with a GPU built-in to the CPU. This will be a major headache when trying to setup VGA passthrough, as VGA is a really old standard. Back when it was created, having multiple graphics cards on a single system was not a configuration they had foreseen or designed for. VGA calls can only be directed to a single device at a time, so the kernel has to use a VGA arbiter that switches the active device and directs VGA accesses to the appropriate card. I am oversimplifying this a bit, but Alex Williamson has a detailed post explaining the technical issues. In short, the x-vga=on flag passed to VFIO indicates to the VGA arbiter that the VFIO driver will need to participate in VGA arbitration, so everyone stays happy.

The problem is that the Xorg i915 driver for Intel's integrated GPUs does not participate in VGA arbitration, even though the devices claim the VGA address space. This means VGA calls get directed to the wrong card, (a) messing up your display on the host and (b) preventing the graphics card on the VM from functioning correctly. Ugh.

Fortunately, Alex has also written kernel patches to fix this, however be warned that they cripple 3D performance on the Intel GPU. Since we're building a high-performance VM for gaming, I am assuming that will not be an issue for you.

Kernel patches:

NoSnoop

NoSnoop is a feature flag on a PCIe device that allows it to issue transitions that to bypass cache. This can cause consistency problems when passing through the card to a virtual machine.

You can check if you card has NoSnoop enabled by running lspci -vvvv as root and seeing if your graphics card lists NoSnoop+ (enabled) or NoSnoop- (disabled) under the Capabilities > DevCtl section.

Previously this required patches to the kernel, but with kernel 3.15.5 in Fedora, these patches are no longer required.

Rebuilding packages

The first step is to setup a packaging environment and download the upstream source RPMS:

yum install fedora-packager yum-utils

rpmdev-setuptree

cd ~/rpmbuild/SRPMS

yumdownloader --source kernel

rpm -i kernel*.src.rpm

It should have downloaded kernel-[version].fc20.src.rpm in your directory.

Download any of the patches listed above that may be required for your hardware configuration (as plaintext patch/diff files) and save them to ~/rpmbuild/SOURCES .

Rebuilding QEMU

Update 2014-06-09: the newest versions of QEMU in virt-preview (>= 2.0.0) have the NoSnoop patches included! No rebuilding necessary. If you previously followed this guide, remove any QEMU exclusions from yum.conf and update to the latest available version.

Rebuilding Kernel

Download any required patches and save them as plaintext files in your ~/rpmbuild/SOURCES folder. Next, add your PatchXYZ: filename.patch lines where you see other patches being declared. For example:

Next, edit ~/rpmbuild/SPECS/kernel.spec and find the lines where existing patches are listed. You should see something like this:

[...]

# patches headed upstream

Patch12016: disable-i8042-check-on-apple-mac.patch Patch14000: hibernate-freeze-filesystems.patch Patch14010: lis3-improve-handling-of-null-rate.patch Patch15000: nowatchdog-on-virt.patch

Add additional lines corresponding to your patchsets. The result could be something like:

[...]

# patches headed upstream

Patch12016: disable-i8042-check-on-apple-mac.patch Patch14000: hibernate-freeze-filesystems.patch Patch14010: lis3-improve-handling-of-null-rate.patch Patch15000: nowatchdog-on-virt.patch Patch16000: aw-i915-v3-add-vga-arbiter-module-option.patch

Patch16001: aw-vgaarb-non-decoded-resources.patch

I added in the Patch16000 and Patch16000 , specifying the filenames of the patch files I saved into my SOURCES folder.

Now, scroll down to the %prep section where you see other patches being applied. That would look something like:

[...]

#rhbz 1051668

ApplyPatch Input-elantech-add-support-for-newer-elantech-touchpads.patch # CVE-2014-3917 rhbz 1102571 1102715

ApplyPatch auditsc-audit_krule-mask-accesses-need-bounds-checking.patch # END OF PATCH APPLICATIONS

Above the line # END OF PATCH APPLICATIONS , add lines for your patch sets, for example:



[...]

# CVE-2014-3917 rhbz 1102571 1102715

ApplyPatch auditsc-audit_krule-mask-accesses-need-bounds-checking.patch ApplyPatch aw-i915-v3-add-vga-arbiter-module-option.patch

ApplyPatch aw-vgaarb-non-decoded-resources.patch # END OF PATCH APPLICATIONS

Now rebuild the kernel with our patches and install it:

rpmbuild -ba ~/rpmbuild/SPECS/kernel.spec --without=perf --without=tools --without=debug --without=debuginfo

It may list some build dependencies. If so, install them with yum install foo and then run the rpmbuild command again. When the build is complete, it output a list of filenames that you can install. Here's a quick command to do so:

yum reinstall ~/rpmbuild/RPMS/$(uname -m)/kernel-{,headers,devel}*.rpm

Reboot and your system will be fully patched! I suggest you add a line exclude=kernel* to /etc/yum.conf to prevent for patched packages from being upgraded.

Installing KVM

Because this is all experimental stuff, install fedora-virt-preview to get the latest and greatest software set:

wget http://fedorapeople.org/groups/virt/virt-preview/fedora-virt-preview.repo -O /etc/yum.repos.d/fedora-virt-preview.repo

yum install @virtualization

Next, let's be nice to the VMs and give them some time to perform a graceful shutdown before the host powers off:



sed -i 's/#ON_SHUTDOWN=.*/ON_SHUTDOWN=shutdown/' /etc/sysconfig/libvirt-guests

systemctl enable libvirt-guests

systemctl enable libvirtd

Edit the default kernel boot arguments (specified in GRUB_CMDLINE_LINUX of /etc/default/grub ) and add intel_iommu=on for Intel CPUs or iommu=pt iommu=1 for AMD CPUs to turn on IOMMU functionality.

As well, the host initializes certain devices on init (e.g. graphic cards, USB controller, audio chipsets) so these devices need to be manually assigned to the PCI stub driver to prevent the host from using the device during host boot. It allows the VFIO driver to later bind to the devices and pass them to a VM. Add pci-stub.ids=PCI_IDs where PCI_IDs is a comma-separated list of PCI IDs as given by lspci -nn . For example, my GRUB_CMDLINE_LINUX looks like:

GRUB_CMDLINE_LINUX="vconsole.font=latarcyrheb-sun16 rd.lvm.lv=fedora/root $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) rhgb quiet pci-stub.ids=1002:6810,1002:aab0,8086:8c20,8086:153a,8086:8c31 intel_iommu=on systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M enforcing=0"

With the configuration changes in place, regenerate the GRUB configuration:

grub2-mkconfig -o /boot/grub2/grub.cfg

Lastly, we will now need to have VFIO bind to any device you want passed to a VM. This will include devices stubbed on boot and possibly others:



cat << EOF > /etc/sysconfig/vfio-bind

DEVICES="FULL_PCI_IDs"

EOF cat << EOF > /usr/local/bin/vfio-bind

#!/bin/sh

modprobe vfio-pci

for dev in "\$@"; do

vendor=\$(cat /sys/bus/pci/devices/\$dev/vendor)

device=\$(cat /sys/bus/pci/devices/\$dev/device)

if [ -e /sys/bus/pci/devices/\$dev/driver ]; then

echo \$dev > /sys/bus/pci/devices/\$dev/driver/unbind

fi

echo \$vendor \$device > /sys/bus/pci/drivers/vfio-pci/new_id

done

EOF

chmod +x /usr/local/bin/vfio-bind cat << EOF > /etc/systemd/system/vfio-bind.service

[Unit]

Description=Binds devices to vfio-pci

After=syslog.target [Service]

EnvironmentFile=-/etc/sysconfig/vfio-bind

Type=oneshot

RemainAfterExit=yes

ExecStart=-/usr/local/bin/vfio-bind \$DEVICES [Install]

WantedBy=multi-user.target

EOF

systemctl enable vfio-bind.service

systemctl start vfio-bind.service

The system will now automatically attempt to bind to the devices indicated in /etc/sysconfig/vfio-bind to VFIO at bootup. The format of FULL_PCI_IDs is a little different than earlier, as it is space separated and requires a full bus address prefix as per ls /sys/bus/pci/devices . You can use lspci -nn to identify a device, and then the output from the file listing to identify its full prefix. Here's an example of my configuration:

DEVICES="0000:01:00.0 0000:01:00.1 0000:00:1b.0 0000:00:19.0 0000:06:00.0 0000:00:14.0"

Because QEMU normally runs sandboxed, we need to 'unhinge' it and give it root privileges so it can control hardware. In /etc/libvirt/qemu.conf , add:

user = "root"

group = "root"

clear_emulator_capabilities = 0

As well, we need to provide QEMU with access to the VFIO device files. List all available VFIO groups like this:

ls -1 /dev/vfio

For ever number that appears, ensure its full path appears in the cgroup_device_acl configuration parameter. For example, mine looks like so:

cgroup_device_acl = [

"/dev/null", "/dev/full", "/dev/zero",

"/dev/random", "/dev/urandom",

"/dev/ptmx", "/dev/kvm", "/dev/kqemu",

"/dev/rtc","/dev/hpet", "/dev/vfio/vfio",

"/dev/vfio/1", "/dev/vfio/4", "/dev/vfio/5",

"/dev/vfio/6", "/dev/vfio/7", "/dev/vfio/8",

"/dev/vfio/9"

]

That's it! The last step is to actually create your virtual machine. The following snippet creates a sample reference file in your home directory:

cat << EOF > ~/gaming-vm-sample.xml



YOUR_VM_NAME

07478ac0-6a99-11e3-8266-00259086c7d9

MEMORY_KB

MEMORY_KB









NUM_CORES



/machine





hvm



















Haswell

Intel

















































destroy

restart

destroy



/usr/bin/qemu-kvm























































































































































































EOF