We are still actively working on the spam issue.

Difference between revisions of "PCI passthrough"

From InstallGentoo Wiki
Jump to: navigation, search
(Troubleshooting)
(Troubleshooting)
Line 198: Line 198:
 
If your setup doesn't work, do yourself a favor and save probably a lot of time and effort by checking these common points of failure:
 
If your setup doesn't work, do yourself a favor and save probably a lot of time and effort by checking these common points of failure:
  
# '''Are your kernel and KVM actually up to date?''' This means more than just issuing a system update on your package manager: you need to actually check your kernel and KVM version and see if they match with the most up to date versions as advertised on [https://www.kernel.org/ the Linux kernel] and [http://wiki.qemu.org/Main_Page QEMU's web sites]. This is important, because "stable" or "long-term support" distros like Debian, CentOS or Ubuntu LTS run Jurassic-age packages that can be literally ''two years behind the latest version'' in the name of overall stability. If you don't have the latest kernel or QEMU and your system is already fully up to date, you'll have to either switch your package manager to track your distro's "testing"/"unstable"/"UAT" release train and use it to upgrade your entire package tree (e.g. Debian Stretch as of June 2016), or migrate your entire system to a more cutting-edge distro like Arch.
+
# '''Are your kernel and KVM actually up to date?''' This means more than just issuing a system update on your package manager: you need to actually check your kernel and KVM version and see if they match with the most up to date versions as advertised on [https://www.kernel.org/ the Linux kernel] and [http://wiki.qemu.org/Main_Page QEMU's web sites]. This is important, because "stable" or "long-term support" distros like Debian, CentOS or Ubuntu LTS run Jurassic-age packages that can be literally ''two years behind the latest version'' in the name of overall stability. If you don't have the latest kernel or QEMU and your system is already fully up to date, you'll have to either switch your package manager to track your distro's "testing"/"unstable"/"UAT" release train and then upgrade your entire package tree (e.g. Debian Stretch as of June 2016), or migrate your entire system to a more cutting-edge distro like Arch.
 
# '''Is the IOMMU enabled on your BIOS configuration?''' Some CPU features like IOMMU or even hardware-accelerated virtualization are disabled by default on your motherboard's BIOS just because normalfags never use them. Look in your BIOS configuration for something like "virtualization", "virtual I/O", "IOMMU" or similar terms and make sure you have them all enabled.
 
# '''Is the IOMMU enabled on your BIOS configuration?''' Some CPU features like IOMMU or even hardware-accelerated virtualization are disabled by default on your motherboard's BIOS just because normalfags never use them. Look in your BIOS configuration for something like "virtualization", "virtual I/O", "IOMMU" or similar terms and make sure you have them all enabled.
 
# '''Is your system accessing your GPU during boot before assigning it to your VM?''' Some strange issues may arise if your GPU receives video output before getting bound to vfio-pci. If this is your case, try adding {{ic|video=efifb:off}} to your kernel command line.
 
# '''Is your system accessing your GPU during boot before assigning it to your VM?''' Some strange issues may arise if your GPU receives video output before getting bound to vfio-pci. If this is your case, try adding {{ic|video=efifb:off}} to your kernel command line.
  
 
[[Category:GNU/Linux]] [[Category:Tutorials]]
 
[[Category:GNU/Linux]] [[Category:Tutorials]]

Revision as of 19:07, 27 June 2016

PCI passthrough is a technology that allows you to directly present an internal PCI device to a virtual machine. The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. PCI passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.

As you can imagine, the most common application for PCI passthrough at least on the chansphere is vidya, since PCI passthrough allows a VM direct access to your graphics card with the end result of being able to play games with nearly the same performance as if you were running your game directly on your computer.

Why PCI passthrough?

  1. It lets you ditch Windows while letting you play vidya safely. PCI passthrough is your big fat middle finger against Macrohard and its recent quest to aggressively strong-arm the entire world into surrendering to the privacy-raping botnet that is Windows 10. A lot of free software enthusiasts have felt the need to leave videogames behind on account of not being libre and try as hard as possible to justify themselves with "I've outgrown them" or "Videogames are empty entertainment". Others have to begrudgingly dual boot between Windows for gaming and Linux for everything else. With PCI passthrough, you don't need to do any of that -- you will be running Windows 10 safely sealed in a VM with simulated hardware and no access (in theory) to the rest of your system, keeping your Linux and your hardware protected from NSA backdoors and from Microsoft's indiscriminate mass surveillance campaign.
  2. It protects you from GPU malware. Guess what? It is possible to run nearly undetectable malware on your graphics card. But if your graphics card is facing a VM, the compromise will be limited to your VM, keeping your physical OS safe.
  3. You can actually encrypt your Windows with it. The old days of safe encrypted Windows installations with Truecrypt ended with 64-bit Windows 7, which requires one boot partition and one system partition; now, you have to entrust yourself to Microsoft BitLocker, which is shit crypto by virtue of being tightly closed source and inauditable. But... if you use Linux as physical OS and keep your Windows on a VM, you will be able to profit from the tried-and-known-good solutions that are VeraCrypt and cryptsetup, and Microsoft BitLocker is not going to be a problem as your physical Linux is going to be the one that provides full disk encryption.
  4. It helps you keep a completely libre operating system. Device drivers are usually the very first place where a strictly libre Linux starts getting tainted with commercial code. But if you delegate the one or two devices that have no libre drivers to a VM through PCI passthrough, that's a whole 'nother story.
  5. Some niche applications require it. For example, if you want to use a MIDI sequencer or other kinds of advanced audio production hardware, you only have two choices: you dual boot between Windows and Linux, or you present this hardware to your VM through PCI passthrough. Or if you want to run an Asterisk phone exchange with Digium land-line cards, it is a very good idea to run it on a VM with PCI passthrough in order to isolate your server from the rest of your system.
  6. You can block Microsoft's botnet servers without having to spend on a separate router. Simply block them on your host system and your Windows guest will never be able to send anything to Microsoft.

Prerequisites

  • A CPU that supports Intel VT-d or AMD-Vi. Check your CPU datasheet to confirm this.
  • A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar.
  • At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but you risk locking yourself away from your computer if you do so).
  • A Linux distribution with recent-ish packages. This means Debian and CentOS Stable will probably not work, as they run very old kernels and even older versions of QEMU that might have buggy, partial or broken PCI passthrough support. If you have a "stable" distro and PCI passthrough doesn't work for you, try switching your package manager to the "testing" branch before you try anything else.

Step 1: Check for IOMMU support, enable if you don't have it

Start out by modifying the kernel command line on your bootloader to enable IOMMU. For this, you need two parameters: iommu=on, and then amd_iommu=on or intel_iommu=on depending on whether you have an AMD or Intel CPU. Your kernel command line should look a bit like this:

linux /vmlinuz-4.6.0-1-amd64 root=UUID=XYZUVWIJK ro quiet iommu=on amd_iommu=on

Reboot your system, and check if AMD-Vi or Intel VT-d were enabled by checking your kernel log. On AMD, use grep AMD-Vi; on Intel, use grep -e DMAR -e IOMMU':

# dmesg | grep AMD-Vi
[    0.953668] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    0.953669] AMD-Vi:  Extended features:  PreF PPR GT IA
[    0.953672] AMD-Vi: Interrupt remapping enabled
[    0.953768] AMD-Vi: Lazy IO/TLB flushing enabled

Step 2: Find out your IOMMU groups

Your devices will be organized into IOMMU groups, which are the smallest sets of physical devices that can be passed to a VM and depend on how your motherboard is wired and organized. For example, if you want direct access to your motherboard's audio chip, but it's on the same IOMMU group as your IDE controller and your SMBus (the one that provides access to thermometers, voltage sensors, fan speedometers and so on), you're going to have to give up your audio chip, your IDE controller and your SMBus all combined.

To check this info, use this command:

$ for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done
IOMMU group 1
        00:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Port [1022:1412]
        01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
        01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
IOMMU group 7
        00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 14)
        00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH IDE Controller [1022:780c]
        00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller [1022:780d] (rev 01)
        00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:780e] (rev 11)

As shown here, we're not fine with our IOMMU group 7, because it has tons of devices apart from the audio chip, which means that if we want direct access audio on a VM, our physical OS is going to have to give up the audio chip, the IDE controller (i.e. no more old DVD-RW drive for you), the SMBus sensor access chip, and the ISA bridge (that one is fortunately a minor nuisance since ISA is obsolete since like 1998). As for IOMMU group 1, we're more or less fine. As shown on the Arch Linux guide on PCI passthrough, we have a PCI bridge on the same IOMMU group as our graphics card and HDMI audio output. This means our PCI slot is provided by both the PCH (the successor of the north/south bridge) and the CPU, which in turn means it is possible for other devices apart from our GPU to be on that IOMMU port. This is OK in our case because we were lucky to have only the GPU, its HDMI audio and the PCI bridge on that group, but you're going to have to be wary to not tell the VM to grab your PCI bridge.

Step 3: Block access on your physical OS to the GPU

Now that we have identified the IOMMU group where our GPU lives, we will now prevent your OS from letting the display driver gain access to the GPU by binding it to a placeholder driver. By far the easiest way to do so is with vfio-pci, which is a modern PCI passthrough driver designed to Just Work out of the box with straightforward configuration. You can use pci-stub if you want to (or if your kernel is older than 4.1), but you might have a hard time getting it to work.

Start out by checking if you have vfio-pci on your system:

# modinfo vfio-pci
filename:       /lib/modules/4.6.0-1-amd64/kernel/drivers/vfio/pci/vfio-pci.ko
description:    VFIO PCI - User Level meta-driver
author:         Alex Williamson <[email protected]>
license:        GPL v2
version:        0.2
srcversion:     E7D052C136278ABB60D003E
depends:        vfio,irqbypass,vfio_virqfd
intree:         Y
vermagic:       4.6.0-1-amd64 SMP mod_unload modversions
parm:           ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string)
parm:           nointxmask:Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to [email protected] so the device can be fixed automatically via the broken_intx_masking flag. (bool)
parm:           disable_vga:Disable VGA resource access through vfio-pci (bool)
parm:           disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)

We do have vfio-pci, so now we will tell it to isolate our GPU and HDMI audio from our physical OS. You can use configuration file /etc/modprobe.d/vfio.conf, but in my case I prefer to use a little script I found on an Arch Linux Forum post. Save this script under /usr/bin/vfio-bind, and make it executable with chmod 755 /usr/bin/vfio-bind:

/usr/bin/vfio-bind
#!/bin/bash
modprobe vfio-pci
for dev in "$@"; do
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

Return to your IOMMU groups and take note of the device number of your GPU and HDMI audio, which in this case are 01:00.0 (the GPU) and 01:00.1 (the HDMI audio), and call that script as follows:

vfio-bind 0000:[device number]

In this case:

vfio-bind 0000:01:00.0 0000:01:00.1

Step 4: Setting up your VM

We're finally done configuring our physical OS, so the next step is to set up a VM that takes the GPU. Depending on your GPU you might have to use the SeaBIOS firmware (good ol' IBM PC BIOS) or the OVMF firmware (a brand new and shiny libre UEFI BIOS).

For that, the best suggestion is to be a man, break away from the coziness of virt-manager and libvirt, and call QEMU directly from the command line, because some commands are not supported by libvirt or are difficult or complex to set up as a libvirt VM definition. Don't worry, QEMU's parameters are rather straightforward and you pretty much just need to properly indent and space them to make sense out of them.

Start out by plugging your monitor to your passed-through GPU and summoning a VM as follows:

qemu-system-x86_64 -enable-kvm -m 512 -cpu host,kvm=off \
-smp <number of virtual CPUs>,sockets=1,cores=<number of CPU cores>,threads=<2 if your Intel CPU has Hyper-Threading, 1 otherwise> \
-device vfio-pci,host=<device number of your GPU>,x-vga=on -device vfio-pci,host=<device number of your HDMI output, omit this entire section if you don't have it> \
-vga none

Filling in the values for our system:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none

Or if you want to use OVMF:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/ovmf_code_x64.bin \
-drive if=pflash,format=raw,file=/usr/share/ovmf/x64/ovmf_vars_x64.bin \
-vga none


Explanation:

  • kvm=off: This parameter hides the KVM hypervisor signature. Nvidia's display drivers don't play nicely with hypervisors and can cause weird, cryptic errors if they find out they're running on a VM, because Nvidia makes normalfag cards for normalfags with normalfag setups and Windows 10. However, it turns out that these drivers, for now, just check for a hypervisor signature and that's it, so hiding it should do the tric.
  • x-vga=on: Required for VGA assignment.
  • -vga none: Also required for VGA assignment. This disables QEMU's simulated VGA device, the one that receives graphical output and passes it on to a VNC/Spice server so you can open it with virt-manager.

If you see BIOS output on your GPU and a message that says the BIOS couldn't find any bootable media (normal since we're not currently passing any storage media), you just finished setting up GPU passthrough on your VM and that BIOS output you're seeing is your VM assuming direct control over your GPU.

Step 5: Piecing together a VM suitable for vidya

Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM.

Start out by creating the HD image. When creating your image, consider the disk space your games will take, add at least 10 GB to account for temporary installation files, and then add another 10 GB for Windows alone. For the sake of this tutorial we will be building a minimal testing environment with only Windows 7 and League of Legends, which combined take up about 20 GB of storage.

You have two choices when creating a disk image: a raw image, which is a simple bit-by-bit representation of an actual HD's contents, or a qcow2 image, which is a compact image format that only contains non-blank sectors (blank here means containing all binary zeroes) and therefore takes up less disk space, plus some cool features like snapshots or compression, all of that at the expense of CPU overhead when performing disk I/O as the system must figure out on which byte of the file a given sector actually is.

Create your qcow2 disk image as follows:

qemu-img create -f qcow2 /root/IOMMU.qcow2 20G

Or if you want to create a raw image:

dd if=/dev/zero of=/root/IOMMU.img bs=1G count=0 seek=20

Now we piece together a VM with some basic hardware

qemu-system-x86_64 -enable-kvm -m <MB of RAM you want to use> -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none \
 \
-drive file=/root/IOMMU.qcow2,id=disk,format=qcow2,if=none \
-device ide-hd,bus=ide.1,drive=disk \
-boot order=dc \
 \
-soundhw hda \
 \
-drive file=/home/niconiconii/torrents/WINDOWS-7-SDLG-EDITION.100%REALNOFAKE.FULL.CRACK-MEDICINA-PIRATA-KEYGEN.1ZELDA-INTERCAMBIABLE.FULL-NO-RIP.+TUTORIAL-LOQUENDO.iso,id=virtiocd,if=none \
-device ide-cd,bus=ide.1,drive=virtiocd \

Explanation:

  • -m: Worth mentioning to make sure you choose the right amount of RAM for your VM. Windows Vista and later eat at least 1 GB of it. However, if you plan to divide your computer usage between vidya and a Linux desktop session, you won't be able to use all your system's RAM.
  • -drive file=...,id=XYZ,format=qcow2: To define a logical storage you must first define the location of the drive's image file, its format, and assign it a name.
  • -device ide-hd,bus=ide.1,drive=XYZ: This is where you assign the image file define before to a storage hardware. On Windows it is a good idea to use IDE HDs, because Windows doesn't really likes SCSI storage hardware and might do anything from refusing to install without a virtio SCSI driver disc or throwing a BSOD while booting due to lack of virtio SCSI driver.
  • -device ide-cd: Same as before, but with a CD drive.
  • -boot order=dc: Indicates the boot order via a nomenclature based off traditional drive letters on Windows. As configured here, we're scanning for bootable media first on the CD drive, which is drive "d" (in Windows it's usually D:\), then on the HD, which is drive "c" (in Windows it's usually C:\). This boot order has the advantage that if you assign a virtual CD drive, e.g. to install Windows or use a partitioning tool, it will load the CD instead of the HD, but if you remove the CD you'll only have the HD and your system will therefore start the HD. Of course, you can replace this predefined boot order with a menu that will default to booting from the hard drive, in which case you use boot menu=on instead.
  • -soundhw hda: Adds a simulated sound chip that outputs your computer's sound through your physical OS's audio system. If you have issues with it (which is rather likely given just how difficult it is to deal with audio on Linux), consider also giving your VM access to your sound chip over PCI passithrough.

Now that we have our gaming VM, you can now begin installing Windows. We will assume that if you've gotten this far through the tutorial you have the skills required to install Windows and therefore don't need any explanation. Just don't panic if your display appears in funky 16 color 640x480 mode: if it does so, it's most likely because Windows doesn't even have a VESA-like driver for your video card and BIOS, and chances are it will work normally once you install your video driver.

Once you have Windows installed, get a DirectX game with 3D graphics (such as League of Legends) and test how it works. Your game should work without a single hitch and with little or no performance difference between your setup and a plain ordinary physically installed Windows. If it does, then congratulations -- you have finally broken your vidya free from the botnet.

Do note that it is a good idea to gradually sunset your games on Windows instead of suddenly migrating everything to your VM, e.g. start out by playing League only once a day on your VM, then gradually start playing more on your VM and less on your physical OS, as the extra layers of complexity involved in a VM PCI passthrough scenario greatly increase the amount of points of failure and can cause unexpected issues.

Tuning

After you're done configuring your KVM gaming VM, you can try some system tweaks and adjustments to see if you get better performance.

Memory hugepages

Your x86 CPU doesn't access your memory directly. Instead, it addresses pages that amount to 4 KB of memory at a time. Of course, this memory model makes memory lookups slower -- and when you do that on an x86 VM that has yet another paged memory model, that can slow down your VM's performance quite a bit in some scenarios. To mitigate this, you can use a technology called hugepages, which replaces some of your 4 KB pages with large 2 MB pages and increases performance by cutting down on page lookups and amount of memory set aside for your system's page table. The downside of this is that you'll have to permanently set aside the RAM you intend to use on your VM, because hugepages are always marked as claimed memory.

To do so, start out by setting up your hugepages filesystem:

# mount -t hugetlbfs hugetlbfs /dev/hugepages
# sysctl vm.nr_hugepages=1024 # 1024 pages * 2 MB/page = 2 GB of hugepages

Then use your hugepages FS as memory location by adding this parameter to your KVM command line:

-mem-path /dev/hugepages

Troubleshooting

If your setup doesn't work, do yourself a favor and save probably a lot of time and effort by checking these common points of failure:

  1. Are your kernel and KVM actually up to date? This means more than just issuing a system update on your package manager: you need to actually check your kernel and KVM version and see if they match with the most up to date versions as advertised on the Linux kernel and QEMU's web sites. This is important, because "stable" or "long-term support" distros like Debian, CentOS or Ubuntu LTS run Jurassic-age packages that can be literally two years behind the latest version in the name of overall stability. If you don't have the latest kernel or QEMU and your system is already fully up to date, you'll have to either switch your package manager to track your distro's "testing"/"unstable"/"UAT" release train and then upgrade your entire package tree (e.g. Debian Stretch as of June 2016), or migrate your entire system to a more cutting-edge distro like Arch.
  2. Is the IOMMU enabled on your BIOS configuration? Some CPU features like IOMMU or even hardware-accelerated virtualization are disabled by default on your motherboard's BIOS just because normalfags never use them. Look in your BIOS configuration for something like "virtualization", "virtual I/O", "IOMMU" or similar terms and make sure you have them all enabled.
  3. Is your system accessing your GPU during boot before assigning it to your VM? Some strange issues may arise if your GPU receives video output before getting bound to vfio-pci. If this is your case, try adding video=efifb:off to your kernel command line.