Overview
PCIem is a Linux framework that enables software developers to write PCIe card drivers on the host to target unexisting PCIe cards on the bus.
This helps model a driver if the developers don’t have a physical PCIe card to test it on, effectively enabling pre-silicon engineering on the software side of things for the prototype card.
Initially, PCIem was thought after enabling communications from/to a QEMU instance; but has ever since been evolving to support a wide variety of use-cases.
You can think of the framework as a MITM (Man-in-the-Middle) that sits between the untouched, production drivers (Which are, unaware of PCIem’s existence) and the Linux kernel.
┌──────────────────────────────────────────┐ ┌──────────────────────────────────────────────────┐
│ │ │ │
│ ┌─────────►Host Linux Kernel │ │ Linux Userspace │
│ │ │ │ │
│ │ │ │ │
│ │ ┌────────────────────────────┐ │ │ ┌────────────────────────────────────────┐ │
│ │ │ PCIem Framework ◄──────┼────────────►/dev/pciem◄───────────┼────► Userspace PCI shim │ │
│ │ │ │ │ │ │ │ │
│ │ │ - PCI Config Space │ │ │ │ - Emulates PCIe device logic │ │
│ │ │ │ │ │ │ │ │
│ │ │ - BAR Mappings │ │ │ └────────────────────────────────────────┘ │
│ │ │ │ │ │ │
│ │◄───┤ - INT/MSI/MSI-X Interrupts │ │ │ │
│ │ │ │ │ └──────────────────────────────────────────────────┘
│ │ │ - DMA (With/without IOMMU) │ │ Userspace
│ │ │ │ │
│ │ │ - P2P DMA │ │
│ │ │ │ │
│ │ └────────────────────────────┘ │
│ │ │
│ │ │
│ │ PCIe driver is unaware of PCIem │
│ │ │
│ │ │
│ │ ┌──────────────────────────────────┐ │
│ │ │ Real PCIe Driver │ │
│ │ │ │ │
│ └─┤ - Untouched logic from production│ │
│ │ │ │
│ └──────────────────────────────────┘ │
│ │
└──────────────────────────────────────────┘
Kernel Space
Getting started
Cloning the repository
In order to use PCIem, we’ll first clone the repository:
git clone https://github.com/cakehonolulu/pciem
With the repository already cloned, we’ll enter it:
cd pciem/
Compiling the code
To compile PCIem, it should be enough to have make, a C compiler & the Linux headers for your currently-running kernel.
Depending on the amount of features your userspace-PCI shim has, you may need further requirements (Think of, displaying a framebuffer the driver writes on with SDL3 or alike); but for a simple one, the aforementioned ones will suffice.
Issuing a compilation is as simple as:
make all
Using PCIem
After we compile everything, we’ll end up with a bunch of object files, an executable and a kernel module.
In short, the way this framework works is by loading the pciem.ko kernel driver (With a certain set of parameters that’ll be discussed below), then loading the userspace-shim that actually creates the device using the exported functionalities of PCIem, and finally; you load the actual (Untouched!) PCIe driver you want to test.
┌───────┐ ┌──────────┐ ┌────────────┐
│ Linux ├─────► pciem.ko ├─────► User-space │
└───────┘ └──────────┘ │ shim │
└────────────┘
PCIem parameters
pciem_phys_regions
Let’s start with the preface that, we’re obviously going to use insmod to load pciem.ko, but the way we do it changes the behaviour of the framework.
Starting from PCIem 0.1, one can specify kernel module arguments to alter/instruct certain logic on the code.
pciem_phys_regions: This basically specifies what physically-contiguous memory regions are reserved and free to use by PCIem. This is attained by passing the memmap= argument to Linux’s cmdline.
As per kernel.org’s memmap:
memmap=nn[KMG]$ss[KMG]
[KNL,ACPI,EARLY] Mark specific memory as reserved.
Region of memory to be reserved is from ss to ss+nn.
Example: Exclude memory from 0x18690000-0x1869ffff
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
Some bootloaders may need an escape character before '$',
like Grub2, otherwise '$' and the following number
will be eaten.
This effectively means that, if we append the following to the cmdline:
memmap=128M$0x1bf000000
Linux is going to carve 128M starting from physical address 0x1bf000000 out of the System RAM and mark it as Reserved so PCIem can use it.
With that exact setup, we’d then load PCIem as follows:
sudo insmod kernel/pciem.ko pciem_phys_regions="bar0:0x1bf000000:0x10000,bar2:0x1bf100000:0x100000"
What this does is, tell PCIem that, out of the reserved memory region, we’ll do (In terms of BAR assignation):
memmap=128M$0x1bf000000
┌────────────┬─────────────────────────────────┐ Start of reserved
│0x1bf000000 │ BAR0 │
│0x1bf010000 │ │
└────────────┼─────────────────────────────────┤
. │ Free │
┌────────────┼─────────────────────────────────┤
│0x1bf100000 │ │
│ │ BAR2 │
│0x1bf200000 │ │
└────────────┼─────────────────────────────────┤
. │ │
│ │
. │ │
│ Free │
. │ │
│ │
0x1c7000000 │ │
└─────────────────────────────────┘ End of reserved
p2p_regions
The p2p_regions argument is a bit more complex; think of it as a whitelist for DMA accesses within PCIem.
The framework supports Peer-to-Peer DMA, but in order to add a little bit of security; one has to manually whitelist the target BAR region of the device we want to P2P from/to.
To know what to “share”, you need to obtain the device’s desired bar start and end address:
$ lspci -vvv
...
f147:00:00.0 System peripheral: Red Hat, Inc. Virtio 1.0 file system (rev 01)
Subsystem: Red Hat, Inc. Device 0040
Physical Slot: 2113372925
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
NUMA node: 0
Region 0: Memory at e00000000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at e00001000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at c00000000 (64-bit, non-prefetchable) [size=8G]
Capabilities: <access denied>
Kernel driver in use: virtio-pci
If we were to share BAR0 of this particular device, we’d instruct PCIem as follows:
sudo insmod kernel/pciem.ko p2p_regions="0xe00000000:0x1000"
This will, in turn, give PCIem access to that BAR so PCIe shims can do P2P DMA.
General functionality
PCIem exposes an interface at /dev/pciem which you can freely call into to construct your PCIe shim from within the userspace.
One can see the documented interface on the API page.
Dummy Device Walkthrough
This walkthrough demonstrates how to create a minimal PCIe device using PCIem.
We’ll build a simple “clock/counting device” with basic MMIO registers and MSI interrupt support.
The dummy PCIe
Our dummy PCIe has:
- One BAR (BAR0, 4KB) with three 32-bit registers:
REG_CONTROL(0x00): Write 1 to increment counterREG_STATUS(0x04): Status flags (IRQ pending bit)REG_COUNTER(0x08): Current counter value (read-only)
- MSI: One interrupt vector
- Interrupts: Every 10 counts, an MSI is fired
Prerequisites
Before starting, make sure you have:
- PCIem kernel module loaded (see Getting Started)
- A C compiler and kernel headers installed
- Basic understanding of PCIe concepts (BARs, config space, interrupts)
The Code
Userspace Device Emulator
The userspace program creates the PCIe device and handles MMIO operations:
/*
* DummyClockPCIe userspace shim
*/
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <unistd.h>
#include "pciem_userspace.h"
/* Defines for the BAR0 register offsets */
#define REG_CONTROL 0x00
#define REG_STATUS 0x04
#define REG_COUNTER 0x08
#define STATUS_IRQ_PENDING (1 << 0)
/* DummyClockPCIe state */
struct counter_device {
int fd;
uint32_t counter;
uint32_t status;
};
1. Opening the PCIem Device
fd = open("/dev/pciem", O_RDWR);
if (fd < 0) {
perror("Failed to open /dev/pciem");
return 1;
}
This opens the PCIem control device. All subsequent operations use this file descriptor.
2. Creating the Device
struct pciem_create_device create = {0};
ret = ioctl(fd, PCIEM_IOCTL_CREATE_DEVICE, &create);
This tells PCIem we’re starting to configure a new virtual PCIe device.
3. Adding BAR0
struct pciem_bar_config bar = {
.bar_index = 0,
.size = 4096,
.flags = PCI_BASE_ADDRESS_SPACE_MEMORY |
PCI_BASE_ADDRESS_MEM_TYPE_32,
};
ret = ioctl(fd, PCIEM_IOCTL_ADD_BAR, &bar);
This creates a 4KB memory BAR at index 0. The flags specify:
PCI_BASE_ADDRESS_SPACE_MEMORY: This is a memory BAR (not I/O)PCI_BASE_ADDRESS_MEM_TYPE_32: 32-bit addressable
4. Adding MSI Capability
struct pciem_cap_msi_userspace msi = {
.num_vectors_log2 = 0, /* 2^0 = 1 vector */
.has_64bit = 1,
.has_masking = 0,
};
struct pciem_cap_config cap = {
.cap_type = PCIEM_CAP_MSI,
.cap_size = sizeof(msi),
};
memcpy(cap.cap_data, &msi, sizeof(msi));
ret = ioctl(fd, PCIEM_IOCTL_ADD_CAPABILITY, &cap);
5. Setting Config Space
struct pciem_config_space cfg = {
.vendor_id = 0x1234,
.device_id = 0x5678,
.subsys_vendor_id = 0x1234,
.subsys_device_id = 0x5678,
.revision = 0x01,
.class_code = {0x00, 0x00, 0xFF},
.header_type = 0x00,
};
ret = ioctl(fd, PCIEM_IOCTL_SET_CONFIG, &cfg);
This sets how the device identifies itself, one part important later is:
- Vendor/Device ID pair: Used by the kernel to look for drivers
6. Setting Up Watchpoints
Watchpoints allow you to get immediate hardware-level notifications when the driver accesses specific registers:
struct pciem_watchpoint_config wp = {
.bar_index = 0,
.offset = REG_CONTROL, /* 0x00 */
.width = 4, /* 4 bytes (32-bit register) */
.flags = PCIEM_WP_FLAG_BAR_KPROBES,
};
int ret = ioctl(dev.fd, PCIEM_IOCTL_SET_WATCHPOINT, &wp);
if (ret < 0) {
perror("Failed to set watchpoint");
}
Watchpoint flags:
PCIEM_WP_FLAG_BAR_KPROBES: PCIem automatically locates the BAR mapping (recommended)PCIEM_WP_FLAG_BAR_MANUAL: You develop your own heuristic (Or if the driver stores the BAR address somewhere on it’s internal private driver data structure, you can query it in a hacky way using some offset magic)
Note: Watchpoints use hardware debug registers, so you can only have a limited number active (Don’t assume there’s loads…).
7. Registering the Device
ret = ioctl(fd, PCIEM_IOCTL_REGISTER, 0);
This is the moment your device shim becomes visible to the kernel and the local PCIe bus.
It’s worth mentioning that every PCI shim has it’s own private fd you can interact with (To map BARs the shim can alter and whatnot, think fault-injection and similar).
8. Handling Events
Once registered, the kernel will send events when a driver accesses your card:
static void handle_mmio_read(struct counter_device *dev, struct pciem_event *evt)
{
struct pciem_response resp = {
.seq = evt->seq,
.status = 0,
};
switch (evt->offset) {
case REG_COUNTER:
resp.data = dev->counter;
break;
default:
printf("Unimplemented read offset!: %d\n", evt->offset);
/* You would gracefully exit here by calling the dtors and whatnot, but for brevity */
exit(1);
}
write(dev->fd, &resp, sizeof(resp));
}
9. Injecting Interrupts
When you want to signal the driver:
struct pciem_irq_inject irq = { .vector = 0 };
ioctl(fd, PCIEM_IOCTL_INJECT_IRQ, &irq);
This triggers an MSI interrupt. In our DummyClockPCIe example, we fire one every 10 counter increments.
Kernel Driver
The driver is a standard PCIe driver, it’s basically unaware that PCIem exists at all.
static int dummyclockpcie_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
...
ret = pci_enable_device(pdev);
/* We map BAR0 */
bar0 = pci_iomap(pdev, 0, 0);
/* We enable MSIs */
ret = pci_enable_msi(pdev);
}
Then, this, would for instance, generate events:
iowrite32(1, bar0 + REG_CONTROL);
val = ioread32(bar0 + REG_COUNTER);
Anything that accesses the watchpoint will trigger events that your shim will be able to consume and respond accordingly.
When the driver does iowrite32() or ioread32(), PCIem intercepts it and adds the event to it’s per-device ring-buffer that you can manually poll or use the eventfd API enabled to not busywait.
Eventfd
Instead of busy-polling the device for events, you can use Linux’s eventfd mechanism for efficient notification, support is baked in on the pciem_userspace module of the framework:
setup_eventfd(struct counter_device *dev)
{
struct pciem_eventfd_config efd_cfg;
...
event_fd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
if (dev->event_fd < 0)
{
perror("Failed to create eventfd");
...
}
efd_cfg.eventfd = dev->event_fd;
efd_cfg.reserved = 0;
if (ioctl(dev->fd, PCIEM_IOCTL_SET_EVENTFD, &efd_cfg) < 0)
{
perror("Failed to set eventfd");
close(dev->event_fd);
dev->event_fd = -1;
...
}
...
}
API
PCIem exposes a userspace-facing API that enables the developer to do PCIe configuration and handling in an easy manner.
One of the main benefits of doing this entirely in userspace, is that it enables you to do iterations on the shim card virtually for free.
Detailed below, are the definitions for the ioctl and struct PCIem uses to let you configure your shim.
Structures
pciem_create_device
struct pciem_create_device
{
uint32_t flags;
uint32_t mode;
};
pciem_bar_config
struct pciem_bar_config
{
uint32_t bar_index;
uint32_t flags;
uint64_t size;
uint32_t reserved;
};
pciem_cap_config
struct pciem_cap_config
{
uint32_t cap_type;
uint32_t cap_size;
uint8_t cap_data[256];
};
pciem_cap_msi_userspace
struct pciem_cap_msi_userspace
{
uint8_t num_vectors_log2;
uint8_t has_64bit;
uint8_t has_masking;
uint8_t reserved;
};
pciem_cap_msix_userspace
struct pciem_cap_msix_userspace
{
uint8_t bar_index;
uint8_t reserved[3];
uint32_t table_offset;
uint32_t pba_offset;
uint16_t table_size;
uint16_t reserved2;
};
pciem_config_space
struct pciem_config_space
{
uint16_t vendor_id;
uint16_t device_id;
uint16_t subsys_vendor_id;
uint16_t subsys_device_id;
uint8_t revision;
uint8_t class_code[3];
uint8_t header_type;
uint8_t reserved[7];
};
pciem_event
#define PCIEM_EVENT_MMIO_READ 1
#define PCIEM_EVENT_MMIO_WRITE 2
#define PCIEM_EVENT_CONFIG_READ 3
#define PCIEM_EVENT_CONFIG_WRITE 4
#define PCIEM_EVENT_MSI_ACK 5
#define PCIEM_EVENT_RESET 6
struct pciem_event
{
uint64_t seq;
uint32_t type;
uint32_t bar;
uint64_t offset;
uint32_t size;
uint32_t reserved;
uint64_t data;
uint64_t timestamp;
};
pciem_response
struct pciem_response
{
uint64_t seq;
uint64_t data;
int32_t status;
uint32_t reserved;
};
pciem_irq_inject
struct pciem_irq_inject
{
uint32_t vector;
uint32_t reserved;
};
pciem_dma_op
#define PCIEM_DMA_FLAG_READ 0x1
#define PCIEM_DMA_FLAG_WRITE 0x2
struct pciem_dma_op
{
uint64_t guest_iova;
uint64_t user_addr;
uint32_t length;
uint32_t pasid;
uint32_t flags;
uint32_t reserved;
};
pciem_dma_atomic
#define PCIEM_ATOMIC_FETCH_ADD 1
#define PCIEM_ATOMIC_FETCH_SUB 2
#define PCIEM_ATOMIC_SWAP 3
#define PCIEM_ATOMIC_CAS 4
#define PCIEM_ATOMIC_FETCH_AND 5
#define PCIEM_ATOMIC_FETCH_OR 6
#define PCIEM_ATOMIC_FETCH_XOR 7
struct pciem_dma_atomic
{
uint64_t guest_iova;
uint64_t operand;
uint64_t compare;
uint32_t op_type;
uint32_t pasid;
uint64_t result;
};
pciem_p2p_op_user
struct pciem_p2p_op_user
{
uint64_t target_phys_addr;
uint64_t user_addr;
uint32_t length;
uint32_t flags;
};
pciem_bar_info_query
struct pciem_bar_info_query
{
uint32_t bar_index;
uint64_t phys_addr;
uint64_t size;
uint32_t flags;
};
pciem_watchpoint_config
#define PCIEM_WP_FLAG_BAR_KPROBES (1 << 0)
#define PCIEM_WP_FLAG_BAR_MANUAL (1 << 1)
struct pciem_watchpoint_config
{
uint32_t bar_index;
uint32_t offset;
uint32_t width;
uint32_t flags;
};
pciem_eventfd_config
struct pciem_eventfd_config
{
int32_t eventfd;
uint32_t reserved;
};
IOCTLs
#define PCIEM_IOCTL_MAGIC 0xAF
create_device()
#define PCIEM_IOCTL_CREATE_DEVICE _IOWR(PCIEM_IOCTL_MAGIC, 10, struct pciem_create_device)
This is the first ioctl you should issue. Assuming you’ve already done an open() on /dev/pciem, you can issue create_device to
notify PCIem a new PCIe shim is being crafted from userspace.
It takes the pciem_create_device struct as param, but you can basically send a zero-filled one and it’ll work.
add_bar()
#define PCIEM_IOCTL_ADD_BAR _IOW(PCIEM_IOCTL_MAGIC, 11, struct pciem_bar_config)
Whenever you want to add BAR definitions, you should issue this ioctl.
It takes the pciem_bar_config struct as param and you can initializate it in a myriad of ways, but the main fields you need to properly fill are:
bar_index: Basically, which is the BAR number it’ll represent within PCIem
size: Which size this BAR will occupy
flags: What PCI flags this BAR will hold (Ex. PCI_BASE_ADDRESS_SPACE_MEMORY, PCI_BASE_ADDRESS_MEM_PREFETCH, PCI_BASE_ADDRESS_MEM_TYPE_64…).
Issue add_bar when you are ready to push a new BAR to your PCIem shim device fd.
add_capability()
#define PCIEM_IOCTL_ADD_CAPABILITY _IOW(PCIEM_IOCTL_MAGIC, 12, struct pciem_cap_config)
Use this ioctl to add PCIe capabilities to your shim device.
It takes the pciem_cap_config struct as param. The main fields you’ll care about are:
cap_type: What type of capability you’re adding (Ex. PCIEM_CAP_MSI, PCIEM_CAP_MSIX, PCIEM_CAP_PCIE…)
cap_size: How many bytes of cap_data you’re providing
cap_data: The actual capability data itself. For MSI/MSI-X, you’d typically fill this with pciem_cap_msi_userspace or pciem_cap_msix_userspace structs.
set_config()
#define PCIEM_IOCTL_SET_CONFIG _IOW(PCIEM_IOCTL_MAGIC, 13, struct pciem_config_space)
Sets the PCI configuration space for your device.
This is where you define how your device identifies itself to the system.
It takes the pciem_config_space struct as param.
You’ll want to set:
vendor_id: Your vendor ID
device_id: The device ID
class_code: 3-byte PCI class code that tells the system what type of device this is
Issue this after you’ve added all your BARs and capabilities, but before calling register().
register()
#define PCIEM_IOCTL_REGISTER _IO(PCIEM_IOCTL_MAGIC, 14)
This is the final step in setting up your PCIem device.
Once you call register(), your device configuration is locked and the device becomes visible to the system.
Returns an instance file descriptor that you can use to mmap() your BARs into userspace for direct access.
inject_irq()
#define PCIEM_IOCTL_INJECT_IRQ _IOW(PCIEM_IOCTL_MAGIC, 15, struct pciem_irq_inject)
Issue this to inject an interrupt (After your shim completes some work, for instance).
It takes the pciem_irq_inject struct as param:
vector: Which MSI/MSI-X vector to fire. If you’ve got MSI configured for 4 vectors, valid values are 0-3.
dma()
#define PCIEM_IOCTL_DMA _IOWR(PCIEM_IOCTL_MAGIC, 16, struct pciem_dma_op)
This lets you read from or write to guest physical addresses.
It takes the pciem_dma_op struct as param:
guest_iova: The guest physical address you want to access
user_addr: Your userspace buffer address
length: How many bytes to transfer
flags: Either PCIEM_DMA_FLAG_READ or PCIEM_DMA_FLAG_WRITE
This is IOMMU-aware so, in case your system has one; the translations get handled by PCIem.
dma_atomic()
#define PCIEM_IOCTL_DMA_ATOMIC _IOWR(PCIEM_IOCTL_MAGIC, 17, struct pciem_dma_atomic)
It takes the pciem_dma_atomic struct as param:
guest_iova: Target address in guest memory
op_type: What atomic operation (Ex. PCIEM_ATOMIC_CAS, PCIEM_ATOMIC_FETCH_ADD…)
operand: The value to use for the operation
result: Where the previous value gets returned
Useful if you’re building something that needs lock-free synchronization with the guest since it implements what’s needed to do atomic operations on guest memory.
p2p()
#define PCIEM_IOCTL_P2P _IOWR(PCIEM_IOCTL_MAGIC, 18, struct pciem_p2p_op_user)
Peer-to-peer DMA between PCIe devices. Your PCIem device can directly access another physical device’s memory space.
It takes the pciem_p2p_op_user struct as param:
target_phys_addr: Physical address of the target device (Another device’s BAR, see the p2p_regions module argument!)
user_addr: Your buffer in userspace
length: How many bytes to transfer
flags: Similar to regular DMA flags
get_bar_info()
#define PCIEM_IOCTL_GET_BAR_INFO _IOWR(PCIEM_IOCTL_MAGIC, 19, struct pciem_bar_info_query)
Call this after register() if you need to know the actual physical addresses.
It takes the pciem_bar_info_query struct as param. Set the bar_index going in, and it’ll fill in:
phys_addr: The physical address where this BAR lives
size: The BAR size
flags: The BAR flags
set_watchpoint()
#define PCIEM_IOCTL_SET_WATCHPOINT _IOW(PCIEM_IOCTL_MAGIC, 20, struct pciem_watchpoint_config)
Sets up hardware watchpoints on specific BAR offsets. When the guest writes to these locations, you’ll get notified immediately via the event mechanism (Or your hand-rolled one, if you have done it).
It takes the pciem_watchpoint_config struct as param:
bar_index: Which BAR to watch
offset: Offset within the BAR
width: How many bytes (1, 2, 4, or 8)
flags: Either PCIEM_WP_FLAG_BAR_KPROBES or PCIEM_WP_FLAG_BAR_MANUAL to control how PCIem locates the BAR mapping.
set_eventfd()
#define PCIEM_IOCTL_SET_EVENTFD _IOW(PCIEM_IOCTL_MAGIC, 21, struct pciem_eventfd_config)
This sets up a userspace eventfd which gets notified whenever events arrive from PCIem.
It takes the pciem_eventfd_config struct as param:
eventfd: Your eventfd file descriptor
Once set up, the kernel will signal your eventfd whenever new events land in the ring buffer.