I've always been happy to just let PCIe sit in the background of my servers and connect my devices to the rest of the machine without bothering to know much about its inner workings. But recently I've been trying to work on assigning PCIe card functions into KVM virtual machines and have come to realise that not all PCIe systems are born equal.
I have limited access to a remote HP DL380 Gen8 server and working with this and RHEL6.4 I've easily been able to assign the functions of its built in network card or a plug-in PCI card into a running VM.
Ensure that both the main Intel VT and Intel VT-d features are enabled from the RBSU. Add the kernel option " intel_iommu=on "
A little snippet of XML like
|<hostdev mode='subsystem' type='pci' managed='yes'>|
|<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>|
and a "virsh attach-device vm1 pci.xml" and I get the PCI device active in my VM. Nice and easy.
But I also have some test machines at home and these aren't proving so easy. I've an HP ML110 G7 and an ML310e Gen8. Both of which have a BIOS/RBSU option to enable the Intel VT-d feature.
With the ML110 G7 adding the "intel_iommu=on" to the kernel results in getting errors from the kernel about my BIOS
(this is from RHEL 7.0, I've tried 6.4&5 and Ubuntu 12.04 with the same results, predictably)
[root@ml110c ~]# dmesg | grep -ie iommu -e dmar
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-121.el7.x86_64 root=UUID=15d0146f-02fb-4110-8d56-520ed00ab1d1 ro console=ttyS1 rd.lvm.lv=rhel_ml110c/swap vconsole.font=latarcyrheb-sun16 crashkernel=auto rd.lvm.lv=rhel_ml110c/root vconsole.keymap=uk intel_iommu=on
[ 0.000000] ACPI: DMAR 00000000e9e34a00 00070 (v01 HP ProLiant 00000001 \xffffffd2? 0000162E)
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-121.el7.x86_64 root=UUID=15d0146f-02fb-4110-8d56-520ed00ab1d1 ro console=ttyS1 rd.lvm.lv=rhel_ml110c/swap vconsole.font=latarcyrheb-sun16 crashkernel=auto rd.lvm.lv=rhel_ml110c/root vconsole.keymap=uk intel_iommu=on
[ 0.000000] Intel-IOMMU: enabled
[ 0.000000] [Firmware Warn]: drivers/iommu/dmar.c at 484: Your BIOS is broken; DMAR reported at address fed90000 returns all ones!
[ 0.059106] dmar: Host address width 39
[ 0.060735] dmar: DRHD base: 0x000000fed90000 flags: 0x1
[ 0.062852] dmar: IOMMU: failed to map dmar0
[ 0.064530] dmar: parse DMAR table failure.
With the ML310e Gen8 I don't even get this much output.
[root@ml110d ~]# dmesg | grep -ie iommu -e dmar
Command line: ro root=/dev/mapper/vg_ml110d-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 console=ttyS1 crashkernel=auto rd_LVM_LV=vg_ml110d/lv_root rd_LVM_LV=vg_ml110d/lv_swap rd_NO_DM rd_NO_PLYMOUTH intel_iommu=on
Kernel command line: ro root=/dev/mapper/vg_ml110d-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 console=ttyS1 crashkernel=130M@0M rd_LVM_LV=vg_ml110d/lv_root rd_LVM_LV=vg_ml110d/lv_swap rd_NO_DM rd_NO_PLYMOUTH intel_iommu=on
The same grep on the DL380 Gen8 produces a lot of output.
Both the ML110 and 310 have an Quad Port Intel 82576 network board
(from lspci -vv)
0f:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
Subsystem: Intel Corporation Gigabit ET Quad Port Server Adapter
(and from dmesg)
[ 20.146106] igb 0000:10:00.0: added PHC on eth0
[ 20.146110] igb 0000:10:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:72:28:ea
[ 20.146114] igb 0000:10:00.0: eth0: PBA No: Unknown
[ 21.963020] igb 0000:10:00.1: added PHC on eth3
[ 22.517657] igb 0000:10:00.1: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:72:28:eb
[ 22.864739] igb 0000:10:00.1: eth3: PBA No: Unknown
(they are plugged into fast PCIe slots in the motherboard, but the cards only have the small connector)
But even just attempting to make virtual SR-IOV functions locally is failing.
[root@ml110d ~]# modprobe -r igb ; modprobe igb max_vfs=2
gives errors like
[root@ml110d ~]# dmesg | grep -i sr-iov
igb 0000:0d:00.0: SR-IOV: bus number out of range
igb 0000:0d:00.1: SR-IOV: bus number out of range
igb 0000:0e:00.0: SR-IOV: bus number out of range
igb 0000:0e:00.1: SR-IOV: bus number out of range
Now I've managed to find a whitepaper from HP about SR-IOV and Linux KVM which says that the DL380 Gen8 is supported, but there's no mention of the ML310e, so I presume that it isn't supported.
But what I'm looking for is some way to tell whether a system is likely to support direct device assignments and the SR-IOV functionality, or are these two going to be totally unrelated?
I find references to ATS (Address Translation Services), ACS (Access Control Services) and ARI (Alternative Routing ID Interpretation) and Intel have a very useful white paper that explains what these things mean. I also find comments about whether cards are plugged into PCIe x4 (2.5GT/s) slots or PCIe x8 (5GT/s) slots.
I guess I'm trying to find out what the minimum requirements are.
- What type of PCIe slot / card is the minimum requirement? can it work in PCIe X4 (2.5GT/s) ports or does it need a PCIe X8 (5GT/s) slot & card?
- Does it need Address Translation Services (ATS) ?
- Does it need Access Control Services (ACS) ?
- Does it need Alternative Routing ID Interpretation (ARI) ?
Is there anyway to tell from lspci -vv output whether things are likely to work, I realise I'm likely to need some BIOS support too, but this would be a nice start.
Thanks in advance