Overview
PCIem is a Linux framework that enables software developers to write PCIe card drivers on the host to target unexisting PCIe cards on the bus.
This helps model a driver if the developers don’t have a physical PCIe card to test it on, effectively enabling pre-silicon engineering on the software side of things for the prototype card.
Initially, PCIem was thought after enabling communications from/to a QEMU instance; but has ever since been evolving to support a wide variety of use-cases.
You can think of the framework as a MITM (Man-in-the-Middle) that sits between the untouched, production drivers (Which are, unaware of PCIem’s existence) and the Linux kernel.
┌──────────────────────────────────────────┐ ┌──────────────────────────────────────────────────┐
│ │ │ │
│ ┌─────────►Host Linux Kernel │ │ Linux Userspace │
│ │ │ │ │
│ │ │ │ │
│ │ ┌────────────────────────────┐ │ │ ┌────────────────────────────────────────┐ │
│ │ │ PCIem Framework ◄──────┼────────────►/dev/pciem◄───────────┼────► Userspace PCI shim │ │
│ │ │ │ │ │ │ │ │
│ │ │ - PCI Config Space │ │ │ │ - Emulates PCIe device logic │ │
│ │ │ │ │ │ │ │ │
│ │ │ - BAR Mappings │ │ │ └────────────────────────────────────────┘ │
│ │ │ │ │ │ │
│ │◄───┤ - INT/MSI/MSI-X Interrupts │ │ │ │
│ │ │ │ │ └──────────────────────────────────────────────────┘
│ │ │ - DMA (With/without IOMMU) │ │ Userspace
│ │ │ │ │
│ │ │ - P2P DMA │ │
│ │ │ │ │
│ │ └────────────────────────────┘ │
│ │ │
│ │ │
│ │ PCIe driver is unaware of PCIem │
│ │ │
│ │ │
│ │ ┌──────────────────────────────────┐ │
│ │ │ Real PCIe Driver │ │
│ │ │ │ │
│ └─┤ - Untouched logic from production│ │
│ │ │ │
│ └──────────────────────────────────┘ │
│ │
└──────────────────────────────────────────┘
Kernel Space
Getting started
Cloning the repository
In order to use PCIem, we’ll first clone the repository:
git clone https://github.com/cakehonolulu/pciem
With the repository already cloned, we’ll enter it:
cd pciem/
Compiling the code
To compile PCIem, it should be enough to have make, a C compiler & the Linux headers for your currently-running kernel.
Depending on the amount of features your userspace-PCI shim has, you may need further requirements (Think of, displaying a framebuffer the driver writes on with SDL3 or alike); but for a simple one, the aforementioned ones will suffice.
Issuing a compilation is as simple as:
make all
Using PCIem
After we compile everything, we’ll end up with a bunch of object files, an executable and a kernel module.
In short, the way this framework works is by loading the pciem.ko kernel driver (With a certain set of parameters that’ll be discussed below), then loading the userspace-shim that actually creates the device using the exported functionalities of PCIem, and finally; you load the actual (Untouched!) PCIe driver you want to test.
┌───────┐ ┌──────────┐ ┌────────────┐
│ Linux ├─────► pciem.ko ├─────► User-space │
└───────┘ └──────────┘ │ shim │
└────────────┘
PCIem parameters
pciem_phys_regions
Let’s start with the preface that, we’re obviously going to use insmod to load pciem.ko, but the way we do it changes the behaviour of the framework.
Starting from PCIem 0.1, one can specify kernel module arguments to alter/instruct certain logic on the code.
pciem_phys_regions: This basically specifies what physically-contiguous memory regions are reserved and free to use by PCIem. This is attained by passing the memmap= argument to Linux’s cmdline.
As per kernel.org’s memmap:
memmap=nn[KMG]$ss[KMG]
[KNL,ACPI,EARLY] Mark specific memory as reserved.
Region of memory to be reserved is from ss to ss+nn.
Example: Exclude memory from 0x18690000-0x1869ffff
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
Some bootloaders may need an escape character before '$',
like Grub2, otherwise '$' and the following number
will be eaten.
This effectively means that, if we append the following to the cmdline:
memmap=128M$0x1bf000000
Linux is going to carve 128M starting from physical address 0x1bf000000 out of the System RAM and mark it as Reserved so PCIem can use it.
With that exact setup, we’d then load PCIem as follows:
sudo insmod kernel/pciem.ko pciem_phys_regions="bar0:0x1bf000000:0x10000,bar2:0x1bf100000:0x100000"
What this does is, tell PCIem that, out of the reserved memory region, we’ll do (In terms of BAR assignation):
memmap=128M$0x1bf000000
┌────────────┬─────────────────────────────────┐ Start of reserved
│0x1bf000000 │ BAR0 │
│0x1bf010000 │ │
└────────────┼─────────────────────────────────┤
. │ Free │
┌────────────┼─────────────────────────────────┤
│0x1bf100000 │ │
│ │ BAR2 │
│0x1bf200000 │ │
└────────────┼─────────────────────────────────┤
. │ │
│ │
. │ │
│ Free │
. │ │
│ │
0x1c7000000 │ │
└─────────────────────────────────┘ End of reserved
p2p_regions
The p2p_regions argument is a bit more complex; think of it as a whitelist for DMA accesses within PCIem.
The framework supports Peer-to-Peer DMA, but in order to add a little bit of security; one has to manually whitelist the target BAR region of the device we want to P2P from/to.
To know what to “share”, you need to obtain the device’s desired bar start and end address:
$ lspci -vvv
...
f147:00:00.0 System peripheral: Red Hat, Inc. Virtio 1.0 file system (rev 01)
Subsystem: Red Hat, Inc. Device 0040
Physical Slot: 2113372925
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
NUMA node: 0
Region 0: Memory at e00000000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at e00001000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at c00000000 (64-bit, non-prefetchable) [size=8G]
Capabilities: <access denied>
Kernel driver in use: virtio-pci
If we were to share BAR0 of this particular device, we’d instruct PCIem as follows:
sudo insmod kernel/pciem.ko p2p_regions="0xe00000000:0x1000"
This will, in turn, give PCIem access to that BAR so PCIe shims can do P2P DMA.
General functionality
PCIem exposes an interface at /dev/pciem which you can freely call into to construct your PCIe shim from within the userspace.
One can see the documented interface on the API page.
Dummy Device Walkthrough
This walkthrough demonstrates how to create a minimal PCIe device using PCIem.
We’ll build a simple “clock/counting device” with basic MMIO registers and MSI interrupt support.
The dummy PCIe
Our dummy PCIe has:
- One BAR (BAR0, 4KB) with three 32-bit registers:
REG_CONTROL(0x00): Write 1 to increment counterREG_STATUS(0x04): Status flags (IRQ pending bit)REG_COUNTER(0x08): Current counter value (read-only)
- MSI: One interrupt vector
- Interrupts: Every 10 counts, an MSI is fired
Prerequisites
Before starting, make sure you have:
- PCIem kernel module loaded (see Getting Started)
- A C compiler and kernel headers installed
- Basic understanding of PCIe concepts (BARs, config space, interrupts)
The Code
Userspace Device Emulator
The userspace program creates the PCIe device, establishes the event channel, and handles MMIO operations.
/*
* DummyClockPCIe userspace shim
*/
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/eventfd.h>
#include <unistd.h>
#include <stdatomic.h>
#include <poll.h>
#include "pciem_userspace.h"
/* Defines for the BAR0 register offsets */
#define REG_CONTROL 0x00
#define REG_STATUS 0x04
#define REG_COUNTER 0x08
#define STATUS_IRQ_PENDING (1 << 0)
/* DummyClockPCIe state */
struct counter_device {
int fd;
int instance_fd;
uint32_t counter;
uint32_t status;
struct pciem_shared_ring *ring;
};
1. Opening the PCIem Device
fd = open("/dev/pciem", O_RDWR);
if (fd < 0) {
perror("Failed to open /dev/pciem");
return 1;
}
This opens the PCIem control device. All configuration happens on this file descriptor.
2. Creating the Device
struct pciem_create_device create = {0};
ret = ioctl(fd, PCIEM_IOCTL_CREATE_DEVICE, &create);
if (ret < 0) {
perror("Failed to create device");
return 1;
}
This tells PCIem we’re starting to configure a new virtual PCIe device.
3. Adding BAR0
struct pciem_bar_config bar = {
.bar_index = 0,
.size = 4096,
.flags = PCI_BASE_ADDRESS_SPACE_MEMORY |
PCI_BASE_ADDRESS_MEM_TYPE_32,
};
ret = ioctl(fd, PCIEM_IOCTL_ADD_BAR, &bar);
This creates a 4KB memory BAR at index 0.
4. Adding MSI Capability
struct pciem_cap_msi_userspace msi = {
.num_vectors_log2 = 0, /* 2^0 = 1 vector */
.has_64bit = 1,
.has_masking = 0,
};
struct pciem_cap_config cap = {
.cap_type = PCIEM_CAP_MSI,
.cap_size = sizeof(msi),
};
memcpy(cap.cap_data, &msi, sizeof(msi));
ret = ioctl(fd, PCIEM_IOCTL_ADD_CAPABILITY, &cap);
5. Setting Config Space
struct pciem_config_space cfg = {
.vendor_id = 0x1234,
.device_id = 0x5678,
.subsys_vendor_id = 0x1234,
.subsys_device_id = 0x5678,
.revision = 0x01,
.class_code = {0x00, 0x00, 0xFF},
.header_type = 0x00,
};
ret = ioctl(fd, PCIEM_IOCTL_SET_CONFIG, &cfg);
This sets how the device identifies itself, one part important later is:
- Vendor/Device ID pair: Used by the kernel to look for drivers
6. Setting Up Watchpoints
Watchpoints allow you to get immediate hardware-level notifications when the driver accesses specific registers:
struct pciem_watchpoint_config wp = {
.bar_index = 0,
.offset = REG_CONTROL, /* 0x00 */
.width = 4, /* 4 bytes (32-bit register) */
.flags = PCIEM_WP_FLAG_BAR_KPROBES,
};
ret = ioctl(fd, PCIEM_IOCTL_SET_WATCHPOINT, &wp);
if (ret < 0) {
perror("Failed to set watchpoint");
}
Watchpoint flags:
– PCIEM_WP_FLAG_BAR_KPROBES: PCIem automatically locates the BAR mapping (recommended).
– PCIEM_WP_FLAG_BAR_MANUAL: You provide the virtual address manually (advanced usage).
Note: Watchpoints use hardware debug registers, so you can only have a limited number active (Don’t assume there’s loads…).
7. Registering the Device
ret = ioctl(fd, PCIEM_IOCTL_REGISTER, 0);
if (ret < 0) {
perror("Failed to register device");
return 1;
}
printf("Device registered! Instance FD: %d\n", ret);
This makes the device visible to the kernel. It returns a new instance_fd which represents the synthetic device on the bus.
8. Setting up the Event Ring
To receive events efficiently, we map a shared atomic ring buffer and set up an eventfd for notifications. This avoids busy polling.
int efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
struct pciem_eventfd_config efd_cfg = { .eventfd = efd };
if (ioctl(fd, PCIEM_IOCTL_SET_EVENTFD, &efd_cfg) < 0) {
perror("Failed to set eventfd");
return 1;
}
struct pciem_shared_ring *ring = mmap(NULL, sizeof(struct pciem_shared_ring),
PROT_READ | PROT_WRITE, MAP_SHARED,
fd, 0);
uint32_t head = atomic_load(&ring->head);
struct pciem_bar_info_query bar0_info = { .bar_index = 0 };
ioctl(fd, PCIEM_IOCTL_GET_BAR_INFO, &bar0_info);
volatile uint32_t *bar0 = mmap(NULL, bar0_info.size,
PROT_READ | PROT_WRITE,
MAP_SHARED, instance_fd, 0);
9. The Event Loop
Now we enter the main loop. If the ring is empty, we poll() on the eventfd to sleep until the kernel wakes us up.
struct pollfd pfd = {
.fd = efd,
.events = POLLIN
};
struct counter_device dev = { .fd = fd, .counter = 0 };
while (1) {
uint32_t tail = atomic_load(&ring->tail);
if (head == tail) {
if (poll(&pfd, 1, -1) > 0) {
uint64_t count;
read(efd, &count, sizeof(count));
}
continue;
}
while (head != tail) {
atomic_thread_fence(memory_order_acquire);
struct pciem_event *evt = &ring->events[head];
switch (evt->type) {
case PCIEM_EVENT_MMIO_WRITE:
handle_mmio_write(fd, evt, &dev, bar0);
break;
case PCIEM_EVENT_MMIO_READ:
printf("Driver read offset 0x%lx\n", evt->offset);
break;
}
head = (head + 1) % PCIEM_RING_SIZE;
atomic_store(&ring->head, head);
}
}
10. Processing Events
Since PCIem uses a shared memory model, you do not send a response to the kernel for events.
- For Reads: The driver reads directly from the BAR memory you mapped.
- For Writes: The driver writes to memory and posts a notification event. You update your internal state and the BAR memory immediately.
static void handle_mmio_write(int fd, struct pciem_event *evt,
struct counter_device *dev,
volatile uint32_t *bar0)
{
if (evt->bar == 0 && evt->offset == REG_CONTROL) {
if (evt->data & 1) {
dev->counter++;
bar0[REG_COUNTER / 4] = dev->counter;
if (dev->counter % 10 == 0) {
struct pciem_irq_inject irq = { .vector = 0 };
ioctl(fd, PCIEM_IOCTL_INJECT_IRQ, &irq);
}
}
}
}
11. Injecting Interrupts
As shown in the handler above, when you want to signal the driver asynchronously (like when the counter reaches a threshold), you use the IRQ injection IOCTL:
struct pciem_irq_inject irq = { .vector = 0 };
ioctl(fd, PCIEM_IOCTL_INJECT_IRQ, &irq);
API
PCIem exposes a userspace-facing API that enables the developer to do PCIe configuration and handling in an easy manner.
One of the main benefits of doing this entirely in userspace, is that it enables you to do iterations on the shim card virtually for free.
Detailed below, are the definitions for the ioctl and struct PCIem uses to let you configure your shim.
Structures
pciem_create_device
struct pciem_create_device
{
uint32_t flags;
uint32_t mode;
};
pciem_bar_config
struct pciem_bar_config
{
uint32_t bar_index;
uint32_t flags;
uint64_t size;
uint32_t reserved;
};
pciem_cap_config
struct pciem_cap_config
{
uint32_t cap_type;
uint32_t cap_size;
uint8_t cap_data[256];
};
pciem_cap_msi_userspace
struct pciem_cap_msi_userspace
{
uint8_t num_vectors_log2;
uint8_t has_64bit;
uint8_t has_masking;
uint8_t reserved;
};
pciem_cap_msix_userspace
struct pciem_cap_msix_userspace
{
uint8_t bar_index;
uint8_t reserved[3];
uint32_t table_offset;
uint32_t pba_offset;
uint16_t table_size;
uint16_t reserved2;
};
pciem_config_space
struct pciem_config_space
{
uint16_t vendor_id;
uint16_t device_id;
uint16_t subsys_vendor_id;
uint16_t subsys_device_id;
uint8_t revision;
uint8_t class_code[3];
uint8_t header_type;
uint8_t reserved[7];
};
pciem_event
#define PCIEM_EVENT_MMIO_READ 1
#define PCIEM_EVENT_MMIO_WRITE 2
#define PCIEM_EVENT_CONFIG_READ 3
#define PCIEM_EVENT_CONFIG_WRITE 4
#define PCIEM_EVENT_MSI_ACK 5
#define PCIEM_EVENT_RESET 6
struct pciem_event
{
uint64_t seq;
uint32_t type;
uint32_t bar;
uint64_t offset;
uint32_t size;
uint32_t reserved;
uint64_t data;
uint64_t timestamp;
};
pciem_response
struct pciem_response
{
uint64_t seq;
uint64_t data;
int32_t status;
uint32_t reserved;
};
pciem_irq_inject
struct pciem_irq_inject
{
uint32_t vector;
uint32_t reserved;
};
pciem_dma_op
#define PCIEM_DMA_FLAG_READ 0x1
#define PCIEM_DMA_FLAG_WRITE 0x2
struct pciem_dma_op
{
uint64_t guest_iova;
uint64_t user_addr;
uint32_t length;
uint32_t pasid;
uint32_t flags;
uint32_t reserved;
};
pciem_dma_atomic
#define PCIEM_ATOMIC_FETCH_ADD 1
#define PCIEM_ATOMIC_FETCH_SUB 2
#define PCIEM_ATOMIC_SWAP 3
#define PCIEM_ATOMIC_CAS 4
#define PCIEM_ATOMIC_FETCH_AND 5
#define PCIEM_ATOMIC_FETCH_OR 6
#define PCIEM_ATOMIC_FETCH_XOR 7
struct pciem_dma_atomic
{
uint64_t guest_iova;
uint64_t operand;
uint64_t compare;
uint32_t op_type;
uint32_t pasid;
uint64_t result;
};
pciem_p2p_op_user
struct pciem_p2p_op_user
{
uint64_t target_phys_addr;
uint64_t user_addr;
uint32_t length;
uint32_t flags;
};
pciem_bar_info_query
struct pciem_bar_info_query
{
uint32_t bar_index;
uint64_t phys_addr;
uint64_t size;
uint32_t flags;
};
pciem_watchpoint_config
#define PCIEM_WP_FLAG_BAR_KPROBES (1 << 0)
#define PCIEM_WP_FLAG_BAR_MANUAL (1 << 1)
struct pciem_watchpoint_config
{
uint32_t bar_index;
uint32_t offset;
uint32_t width;
uint32_t flags;
};
pciem_eventfd_config
struct pciem_eventfd_config
{
int32_t eventfd;
uint32_t reserved;
};
IOCTLs
#define PCIEM_IOCTL_MAGIC 0xAF
create_device()
#define PCIEM_IOCTL_CREATE_DEVICE _IOWR(PCIEM_IOCTL_MAGIC, 10, struct pciem_create_device)
This is the first ioctl you should issue. Assuming you’ve already done an open() on /dev/pciem, you can issue create_device to
notify PCIem a new PCIe shim is being crafted from userspace.
It takes the pciem_create_device struct as param, but you can basically send a zero-filled one and it’ll work.
add_bar()
#define PCIEM_IOCTL_ADD_BAR _IOW(PCIEM_IOCTL_MAGIC, 11, struct pciem_bar_config)
Whenever you want to add BAR definitions, you should issue this ioctl.
It takes the pciem_bar_config struct as param and you can initializate it in a myriad of ways, but the main fields you need to properly fill are:
bar_index: Basically, which is the BAR number it’ll represent within PCIem
size: Which size this BAR will occupy
flags: What PCI flags this BAR will hold (Ex. PCI_BASE_ADDRESS_SPACE_MEMORY, PCI_BASE_ADDRESS_MEM_PREFETCH, PCI_BASE_ADDRESS_MEM_TYPE_64…).
Issue add_bar when you are ready to push a new BAR to your PCIem shim device fd.
add_capability()
#define PCIEM_IOCTL_ADD_CAPABILITY _IOW(PCIEM_IOCTL_MAGIC, 12, struct pciem_cap_config)
Use this ioctl to add PCIe capabilities to your shim device.
It takes the pciem_cap_config struct as param. The main fields you’ll care about are:
cap_type: What type of capability you’re adding (Ex. PCIEM_CAP_MSI, PCIEM_CAP_MSIX, PCIEM_CAP_PCIE…)
cap_size: How many bytes of cap_data you’re providing
cap_data: The actual capability data itself. For MSI/MSI-X, you’d typically fill this with pciem_cap_msi_userspace or pciem_cap_msix_userspace structs.
set_config()
#define PCIEM_IOCTL_SET_CONFIG _IOW(PCIEM_IOCTL_MAGIC, 13, struct pciem_config_space)
Sets the PCI configuration space for your device.
This is where you define how your device identifies itself to the system.
It takes the pciem_config_space struct as param.
You’ll want to set:
vendor_id: Your vendor ID
device_id: The device ID
class_code: 3-byte PCI class code that tells the system what type of device this is
Issue this after you’ve added all your BARs and capabilities, but before calling register().
register()
#define PCIEM_IOCTL_REGISTER _IO(PCIEM_IOCTL_MAGIC, 14)
This is the final step in setting up your PCIem device.
Once you call register(), your device configuration is locked and the device becomes visible to the system.
Returns an instance file descriptor that you can use to mmap() your BARs into userspace for direct access.
inject_irq()
#define PCIEM_IOCTL_INJECT_IRQ _IOW(PCIEM_IOCTL_MAGIC, 15, struct pciem_irq_inject)
Issue this to inject an interrupt (After your shim completes some work, for instance).
It takes the pciem_irq_inject struct as param:
vector: Which MSI/MSI-X vector to fire. If you’ve got MSI configured for 4 vectors, valid values are 0-3.
dma()
#define PCIEM_IOCTL_DMA _IOWR(PCIEM_IOCTL_MAGIC, 16, struct pciem_dma_op)
This lets you read from or write to guest physical addresses.
It takes the pciem_dma_op struct as param:
guest_iova: The guest physical address you want to access
user_addr: Your userspace buffer address
length: How many bytes to transfer
flags: Either PCIEM_DMA_FLAG_READ or PCIEM_DMA_FLAG_WRITE
This is IOMMU-aware so, in case your system has one; the translations get handled by PCIem.
dma_atomic()
#define PCIEM_IOCTL_DMA_ATOMIC _IOWR(PCIEM_IOCTL_MAGIC, 17, struct pciem_dma_atomic)
It takes the pciem_dma_atomic struct as param:
guest_iova: Target address in guest memory
op_type: What atomic operation (Ex. PCIEM_ATOMIC_CAS, PCIEM_ATOMIC_FETCH_ADD…)
operand: The value to use for the operation
result: Where the previous value gets returned
Useful if you’re building something that needs lock-free synchronization with the guest since it implements what’s needed to do atomic operations on guest memory.
p2p()
#define PCIEM_IOCTL_P2P _IOWR(PCIEM_IOCTL_MAGIC, 18, struct pciem_p2p_op_user)
Peer-to-peer DMA between PCIe devices. Your PCIem device can directly access another physical device’s memory space.
It takes the pciem_p2p_op_user struct as param:
target_phys_addr: Physical address of the target device (Another device’s BAR, see the p2p_regions module argument!)
user_addr: Your buffer in userspace
length: How many bytes to transfer
flags: Similar to regular DMA flags
get_bar_info()
#define PCIEM_IOCTL_GET_BAR_INFO _IOWR(PCIEM_IOCTL_MAGIC, 19, struct pciem_bar_info_query)
Call this after register() if you need to know the actual physical addresses.
It takes the pciem_bar_info_query struct as param. Set the bar_index going in, and it’ll fill in:
phys_addr: The physical address where this BAR lives
size: The BAR size
flags: The BAR flags
set_watchpoint()
#define PCIEM_IOCTL_SET_WATCHPOINT _IOW(PCIEM_IOCTL_MAGIC, 20, struct pciem_watchpoint_config)
Sets up hardware watchpoints on specific BAR offsets. When the guest writes to these locations, you’ll get notified immediately via the event mechanism (Or your hand-rolled one, if you have done it).
It takes the pciem_watchpoint_config struct as param:
bar_index: Which BAR to watch
offset: Offset within the BAR
width: How many bytes (1, 2, 4, or 8)
flags: Either PCIEM_WP_FLAG_BAR_KPROBES or PCIEM_WP_FLAG_BAR_MANUAL to control how PCIem locates the BAR mapping.
set_eventfd()
#define PCIEM_IOCTL_SET_EVENTFD _IOW(PCIEM_IOCTL_MAGIC, 21, struct pciem_eventfd_config)
This sets up a userspace eventfd which gets notified whenever events arrive from PCIem.
It takes the pciem_eventfd_config struct as param:
eventfd: Your eventfd file descriptor
Once set up, the kernel will signal your eventfd whenever new events land in the ring buffer.