We are still actively working on the spam issue.

PCI passthrough

From InstallGentoo Wiki
Revision as of 10:24, 25 June 2016 by Echelon1 (talk | contribs) (Created page with "'''[https://en.wikipedia.org/wiki/X86_virtualization#I.2FO_MMU_virtualization_.28AMD-Vi_and_Intel_VT-d.29 PCI passthrough]''' is a technology that allows you to directly prese...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

PCI passthrough is a technology that allows you to directly present an internal PCI device to a virtual machine. The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. PCI passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.

As you can imagine, the most common application for PCI passthrough at least on the chansphere is vidya, since PCI passthrough allows a VM direct access to your graphics card with the end result of being able to play games with nearly the same performance as if you were running your game directly on your computer.

Why PCI passthrough?

  1. It lets you ditch Windows while keeping your vidya. PCI passthrough is basically your big fat middle finger against Macrohard and its recent quest to aggressively strong-arm the entire world into surrendering to the privacy-raping botnet that is Windows 10. Yes, you will be running Windows 10 anyway -- but safely sealed in a VM with no access to the rest of your system.
  2. It protects you from GPU malware. Guess what? It is possuble to run nearly undetectable malware on your graphics card. But if your graphics card is facing a VM, the compromise will be limited to your VM, keeping your physical OS safe.
  3. Some niche applications require it. For example, if you want to use a MIDI sequencer or other kinds of advanced audio production hardware, you only have two choices: you dual boot between Windows and Linux, or you present this hardware to your VM through PCI passthrough. Or if you want to run an Asterisk phone exchange with Digium land-line cards, it is a very good idea to run it on a VM with PCI passthrough in order to isolate your server from the rest of your system.

Prerequisites

  • A CPU that supports Intel VT-d or AMD-Vi. Check your CPU datasheet to confirm this.
  • A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar.
  • At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but you risk locking yourself away from your computer if you do so).
  • A Linux distribution with recent-ish packages. This means Debian and CentOS Stable will probably not work, as they run very old kernels and even older versions of QEMU that might have buggy, partial or broken PCI passthrough support. If you have a "stable" distro and PCI passthrough doesn't work for you, try switching your package manager to the "testing" branch before you try anything else.

Step 1: Check for IOMMU support, enable if you don't have it

Start out by modifying the kernel command line on your bootloader to enable IOMMU. For this, you need two parameters: iommu=on, and then amd_iommu=on or intel_iommu=on depending on whether you have an AMD or Intel CPU. Your kernel command line should look a bit like this:

linux /vmlinuz-4.6.0-1-amd64 root=UUID=XYZUVWIJK ro quiet iommu=on amd_iommu=on

Reboot your system, and check if AMD-Vi or Intel VT-d were enabled by checking your kernel log. On AMD, use grep AMD-Vi; on Intel, use grep -e DMAR -e IOMMU':

# dmesg | grep AMD-Vi
[    0.953668] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    0.953669] AMD-Vi:  Extended features:  PreF PPR GT IA
[    0.953672] AMD-Vi: Interrupt remapping enabled
[    0.953768] AMD-Vi: Lazy IO/TLB flushing enabled

Step 2: Find out your IOMMU groups

Your devices will be organized into IOMMU groups, which are the smallest sets of physical devices that can be passed to a VM and depend on how your motherboard is wired and organized. For example, if you want direct access to your motherboard's audio chip, but it's on the same IOMMU group as your IDE controller and your SMBus (the one that provides access to thermometers, voltage sensors, fan speedometers and so on), you're going to have to give up your audio chip, your IDE controller and your SMBus all combined.

To check this info, use this command:

$ for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done
IOMMU group 1
        00:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Port [1022:1412]
        01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
        01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
IOMMU group 7
        00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 14)
        00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH IDE Controller [1022:780c]
        00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller [1022:780d] (rev 01)
        00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:780e] (rev 11)

As shown here, we're not fine with our IOMMU group 7, because it has tons of devices apart from the audio chip, which means that if we want direct access audio on a VM, our physical OS is going to have to give up the audio chip, the IDE controller (i.e. no more old DVD-RW drive for you), the SMBus sensor access chip, and the ISA bridge (that one is fortunately a minor nuisance since ISA is obsolete since like 1998). As for IOMMU group 1, we're more or less fine. As shown on the Arch Linux guide on PCI passthrough, we have a PCI bridge on the same IOMMU group as our graphics card and HDMI audio output. This means our PCI slot is provided by both the PCH (the successor of the north/south bridge) and the CPU, which in turn means it is possible for other devices apart from our GPU to be on that IOMMU port. This is OK in our case because we were lucky to have only the GPU, its HDMI audio and the PCI bridge on that group, but you're going to have to be wary to not tell the VM to grab your PCI bridge.

Step 3: Block access on your physical OS to the GPU

Now that we have identified the IOMMU group where our GPU lives, we will now prevent your OS from letting the display driver gain access to the GPU by binding it to a placeholder driver. By far the easiest way to do so is with vfio-pci, which is a modern PCI passthrough driver designed to Just Work out of the box with straightforward configuration. You can use pci-stub if you want to (or if your kernel is older than 4.1), but you might have a hard time getting it to work.

Start out by checking if you have vfio-pci on your system:

# modinfo vfio-pci
filename:       /lib/modules/4.6.0-1-amd64/kernel/drivers/vfio/pci/vfio-pci.ko
description:    VFIO PCI - User Level meta-driver
author:         Alex Williamson <[email protected]>
license:        GPL v2
version:        0.2
srcversion:     E7D052C136278ABB60D003E
depends:        vfio,irqbypass,vfio_virqfd
intree:         Y
vermagic:       4.6.0-1-amd64 SMP mod_unload modversions
parm:           ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string)
parm:           nointxmask:Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to [email protected] so the device can be fixed automatically via the broken_intx_masking flag. (bool)
parm:           disable_vga:Disable VGA resource access through vfio-pci (bool)
parm:           disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)

We do have vfio-pci, so now we will tell it to isolate our GPU and HDMI audio from our physical OS. You can use configuration file /etc/modprobe.d/vfio.conf, but in my case I prefer to use a little script I found on an Arch Linux Forum post. Save this script under /usr/bin/vfio-bind, and make it executable with chmod 755 /usr/bin/vfio-bind:

/usr/bin/vfio-bind
#!/bin/bash
modprobe vfio-pci
for dev in "$@"; do
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

Return to your IOMMU groups and take note of the device number of your GPU and HDMI audio, which in this case are 01:00.0 (the GPU) and 01:00.1 (the HDMI audio), and call that script as follows:

vfio-bind 0000:[device number]

In this case:

vfio-bind 0000:01:00.0 0000:01:00.1

Step 4: Setting up your VM

We're finally done configuring our physical OS, so the next step is to set up a VM that takes the GPU. Depending on your GPU you might have to use the SeaBIOS firmware (good ol' IBM PC BIOS) or the OVMF firmware (a brand new and shiny libre UEFI BIOS).

For that, the best suggestion is to be a man, break away from the coziness of virt-manager and libvirt, and call QEMU directly from the command line, because some commands are not supported by libvirt or are difficult or complex to set up as a libvirt VM definition. Don't worry, QEMU's parameters are rather straightforward and you pretty much just need to properly indent and space them to make sense out of them.

Start out by plugging your monitor to your passed-through GPU and summoning a VM as follows:

qemu-system-x86_64 -enable-kvm -m 512 -cpu host,kvm=off \
-smp <number of virtual CPUs>,sockets=1,cores=<number of CPU cores>,threads=<2 if your Intel CPU has Hyper-Threading, 1 otherwise> \
-device vfio-pci,host=<device number of your GPU>,x-vga=on -device vfio-pci,host=<device number of your HDMI output, omit this entire section if you don't have it> \
-vga none

Filling in the values for our system:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none

Or if you want to use OVMF:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/ovmf_code_x64.bin \
-drive if=pflash,format=raw,file=/usr/share/ovmf/x64/ovmf_vars_x64.bin \
-vga none


Explanation:

  • kvm=off: This parameter hides the KVM hypervisor signature. Nvidia's display drivers don't play nicely with hypervisors and can cause weird, cryptic errors if they find out they're running on a VM, because Nvidia makes normalfag cards for normalfags with normalfag setups and Windows 10. However, it turns out that these drivers, for now, just check for a hypervisor signature and that's it, so hiding it should do the tric.
  • x-vga=on: Required for VGA assignment.
  • -vga none: Also required for VGA assignment. This disables QEMU's simulated VGA device, the one that receives graphical output and passes it on to a VNC/Spice server so you can open it with virt-manager.

If you see BIOS output on your GPU and a message that says the BIOS couldn't find any bootable media (normal since we're not currently passing any storage media), you just finished setting up GPU passthrough on your VM and that BIOS output you're seeing is your VM assuming direct control over your GPU.

Step 5: Piecing together a VM suitable for vidya

Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM. Start out by creating the HD image. Make sure you consider at least 20 GB since Windows 7 alone eats about 10 GB. I personally prefer qcow2 images because you can vacuum them if you run out of disk space and have them take up only whatever your taken sectors amount to, with the downside of having some overhead on disk I/O as the system must figure out on which byte of the file a given sector actually is.

qemu-img create -f qcow2 /root/IOMMU.qcow2 20G

Now we piece together a VM with some basic hardware

qemu-system-x86_64 -enable-kvm -m <MB of RAM you want to use> -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none \
 \
-drive file=/root/IOMMU.qcow2,id=disk,format=qcow2,if=none \
-device ide-hd,bus=ide.1,drive=disk \
-boot order=dc \
 \
-soundhw hda \
 \
-drive file=/home/niconiconii/torrents/WINDOWS-7-SDLG-EDITION.100%PIRAÑA.FULL.CRACK.MEDICINA.1ZELDA.iso,id=virtiocd,if=none \
-device ide-cd,bus=ide.1,drive=virtiocd \

Explanation:

  • -m: Worth mentioning to make sure you choose the right amount of RAM for your VM. Windows Vista and later eat at least 1 GB of it. However, if you plan to divide your computer usage between vidya and a Linux desktop session, you won't be able to use all your system's RAM.
  • -drive file=...,id=XYZ,format=qcow2: To define a logical storage you must first define the location of the drive's image file, its format, and assign it a name.
  • -device ide-hd,bus=ide.1,drive=XYZ: This is where you assign the image file define before to a storage hardware. On Windows it is a good idea to use IDE HDs, because Windows doesn't really likes SCSI storage hardware and might do anything from refusing to install without a virtio SCSI driver disc or throwing a BSOD while booting due to lack of virtio SCSI driver.
  • -device ide-cd: Same as before, but with a CD drive.
  • -boot order=dc: This means the CD will take priority above the HD when starting the OS, i.e. if you assign a virtual CD drive to install Windows it will load the Windows installation CD, but if you remove the CD you'll only have the HD and your system will therefore start the HD.
  • -soundhw hda: Adds a simulated sound chip that outputs your computer's sound through your physical OS's audio system. If you have issues with it (which is rather likely given just how difficult it is to deal with audio on Linux), consider also giving your VM access to your sound chip over PCI passthrough.