Dynamic PAS — Runtime Mode Switching for Varying Workloads
2026-1 Systems TechnologyPAS effectively adapts sleep duration on every I/O, but it always uses the "sleep + poll" pattern. In practice, there are scenarios where this pattern is suboptimal:
When queue depth is 1, only one I/O is processed at a time. If PAS's sleep time is shorter than the device processing time, a brief busy-wait after waking detects completion. → PAS works well.
With multiple concurrent I/Os, effective device latency increases and polling CPU cost grows. In this case:
PAS's sleep duration cannot go below d_init (the minimum).
If the device is so fast that PAS's sleep keeps hitting this floor,
"oversleep despite already being at minimum" repeats.
This signals that sleep is fundamentally unnecessary — Classic Polling is more appropriate.
DPAS (Dynamic PAS) manages 4 modes as a state machine. Every Neval = 100 I/Os, it evaluates observed Queue Depth (QD) and Timer Floor (tf) count to decide mode transitions.
| Mode | ID | Behavior | Suitable For |
|---|---|---|---|
| PAS | 2 | adaptive sleep + poll (default mode) | QD=1, stable I/O latency |
| CP | 1 | classic polling (busy-wait) | QD=1 with very fast device |
| OL | 3 | overload state — evaluates QD to branch to PAS or INT | Transitional when PAS timer hits floor |
| INT | 0 | interrupt-based (sleep + interrupt wake) | Very high QD where polling is inefficient |
| Transition | Condition | Interpretation |
|---|---|---|
| PAS → CP | avg_QD = 1 && param4 ≥ 1 |
Low load (QD=1); switching to CP eliminates sleep overhead |
| PAS → OL | tf > param1 |
Timer floor exceeded — sleep keeps hitting minimum, effectively meaningless |
| CP → PAS | After 1000 I/Os | Return to PAS after a period in CP for re-evaluation |
| OL → PAS | avg_QD ≤ param2 |
Load has decreased → PAS is appropriate again |
| OL → INT | avg_QD > param3 |
Load is very high → abandon polling, use interrupts |
qd_sum) and timer floor count (tf) are accumulated and averaged for the decision.
The number of I/O requests submitted to the device but not yet completed at a given point.
DPAS tracks the average QD during the evaluation period at 10x scale (qd_sum × 10 / count).
avg_QD = 10 → actual QD ≈ 1 (single I/O level)avg_QD > 10 → multiple concurrent I/Os (high load)
The number of times PAS's sleep duration reached the minimum value (d_init).
// Kernel code (blk-mq.c)
dur = dur * adj / div;
if (dur < d_init) {
dur = d_init; // clamp to floor
sc->tf++; // increment timer floor counter
}
A high tf means PAS tries to reduce sleep but is already at the floor.
If this persists (tf > param1), PAS is deemed no longer effective and transitions to OL.
param1 = 0 means even a single timer floor event triggers an immediate OL transition.
Increasing this value allows PAS to tolerate the timer floor state longer.
DPAS transition thresholds can be adjusted at runtime via sysfs. In the kernel lab, you'll modify these parameters and observe behavior changes.
| Parameter | Default | Role |
|---|---|---|
switch_param1 |
0 | PAS → OL threshold: transition to OL when tf > param1 |
switch_param2 |
10 | OL → PAS threshold: return to PAS when avg_QD ≤ param2 |
switch_param3 |
10 | OL → INT threshold: transition to INT when avg_QD > param3 |
switch_param4 |
1 | PAS ↔ CP toggle: 1 enables, 0 disables |
# sysfs paths (QEMU lab)
/sys/block/nvme0n1/queue/switch_param1
/sys/block/nvme0n1/queue/switch_param2
/sys/block/nvme0n1/queue/switch_param3
/sys/block/nvme0n1/queue/switch_param4
# Check per-mode I/O statistics
cat /sys/block/nvme0n1/queue/switch_stat
# → CPU[ 0] MODE[2] QD[ 1] param1: 0 param2: 10 param3: 10 polled io: 42 pas io: 958 ...
MODE[0]=INT, MODE[1]=CP, MODE[2]=PAS, MODE[3]=OLpolled io: I/Os processed in CP modepas io: I/Os processed in PAS modeol io: I/Os processed in OL modeint io: I/Os processed in INT mode| PAS | DPAS | |
|---|---|---|
| Mode | adaptive sleep + poll (single) | PAS / CP / INT auto-switching |
| QD Handling | Cannot adapt to QD changes | Switches mode based on QD |
| High Load | Excessive polling wastes CPU | OL → INT transition protects CPU |
| Ultra-low Latency | Repeated oversleep at timer floor | PAS → CP eliminates sleep overhead |
| Scope | Single workload | Multi-tenant, dynamic workloads |
Now that you understand DPAS's design, install the kernel on a QEMU VM and benchmark performance.
DPAS Kernel Lab → Benchmark INT/CP/PAS/DPAS across 4 modes on a QEMU VMPaper: DPAS: A Prompt, Accurate and Safe I/O Completion Method for SSDs (USENIX FAST '26, Seo et al.)