Part 4: DPAS Introduction

Dynamic PAS — Runtime Mode Switching for Varying Workloads

2026-1 Systems Technology
← Back to Lab Overview 한국어

1. PAS Limitations: Single Mode

PAS effectively adapts sleep duration on every I/O, but it always uses the "sleep + poll" pattern. In practice, there are scenarios where this pattern is suboptimal:

QD = 1 (Single I/O, Low Load)

When queue depth is 1, only one I/O is processed at a time. If PAS's sleep time is shorter than the device processing time, a brief busy-wait after waking detects completion. → PAS works well.

QD > 1 (Concurrent I/O, High Load)

With multiple concurrent I/Os, effective device latency increases and polling CPU cost grows. In this case:

Timer Floor

PAS's sleep duration cannot go below d_init (the minimum). If the device is so fast that PAS's sleep keeps hitting this floor, "oversleep despite already being at minimum" repeats. This signals that sleep is fundamentally unnecessary — Classic Polling is more appropriate.

Fundamental Limitation of Single Mode PAS is optimized for stable I/O latency at QD=1. However, real systems have dynamically changing loads, so a single completion technique cannot handle all scenarios. → DPAS automatically selects the optimal mode at runtime.

2. DPAS State Machine

DPAS (Dynamic PAS) manages 4 modes as a state machine. Every Neval = 100 I/Os, it evaluates observed Queue Depth (QD) and Timer Floor (tf) count to decide mode transitions.

CP Polling PAS Sleep+Poll OL Overloaded INT Interrupt Initial cp_cnt ≥ 1000 param4≥1 && avg_qd = 1.0 tf > param1 avg_qd ≤ param2 avg_qd > param3 int_cnt ≥ 10000 Transitions Load increase Load decrease Re-evaluate (INT→OL) Initial (CP→PAS)

Four Modes

ModeIDBehaviorSuitable For
PAS2 adaptive sleep + poll (default mode) QD=1, stable I/O latency
CP1 classic polling (busy-wait) QD=1 with very fast device
OL3 overload state — evaluates QD to branch to PAS or INT Transitional when PAS timer hits floor
INT0 interrupt-based (sleep + interrupt wake) Very high QD where polling is inefficient

Transition Conditions

TransitionConditionInterpretation
PAS → CP avg_QD = 1 && param4 ≥ 1 Low load (QD=1); switching to CP eliminates sleep overhead
PAS → OL tf > param1 Timer floor exceeded — sleep keeps hitting minimum, effectively meaningless
CP → PAS After 1000 I/Os Return to PAS after a period in CP for re-evaluation
OL → PAS avg_QD ≤ param2 Load has decreased → PAS is appropriate again
OL → INT avg_QD > param3 Load is very high → abandon polling, use interrupts
Evaluation Period All transition decisions are made every Neval = 100 I/Os (PAS, OL) or 1000 I/Os (CP). During this period, QD sum (qd_sum) and timer floor count (tf) are accumulated and averaged for the decision.

3. Key Metrics: QD and Timer Floor

Queue Depth (QD)

The number of I/O requests submitted to the device but not yet completed at a given point. DPAS tracks the average QD during the evaluation period at 10x scale (qd_sum × 10 / count).

Timer Floor (tf)

The number of times PAS's sleep duration reached the minimum value (d_init).

// Kernel code (blk-mq.c)
dur = dur * adj / div;
if (dur < d_init) {
    dur = d_init;       // clamp to floor
    sc->tf++;           // increment timer floor counter
}

A high tf means PAS tries to reduce sleep but is already at the floor. If this persists (tf > param1), PAS is deemed no longer effective and transitions to OL.

What param1 = 0 means The default param1 = 0 means even a single timer floor event triggers an immediate OL transition. Increasing this value allows PAS to tolerate the timer floor state longer.

4. sysfs Parameters

DPAS transition thresholds can be adjusted at runtime via sysfs. In the kernel lab, you'll modify these parameters and observe behavior changes.

ParameterDefaultRole
switch_param1 0 PAS → OL threshold: transition to OL when tf > param1
switch_param2 10 OL → PAS threshold: return to PAS when avg_QD ≤ param2
switch_param3 10 OL → INT threshold: transition to INT when avg_QD > param3
switch_param4 1 PAS ↔ CP toggle: 1 enables, 0 disables
# sysfs paths (QEMU lab)
/sys/block/nvme0n1/queue/switch_param1
/sys/block/nvme0n1/queue/switch_param2
/sys/block/nvme0n1/queue/switch_param3
/sys/block/nvme0n1/queue/switch_param4

# Check per-mode I/O statistics
cat /sys/block/nvme0n1/queue/switch_stat
# → CPU[ 0] MODE[2] QD[ 1] param1: 0 param2: 10 param3: 10 polled io: 42 pas io: 958 ...
Reading switch_stat
  • MODE[0]=INT, MODE[1]=CP, MODE[2]=PAS, MODE[3]=OL
  • polled io: I/Os processed in CP mode
  • pas io: I/Os processed in PAS mode
  • ol io: I/Os processed in OL mode
  • int io: I/Os processed in INT mode

5. PAS vs DPAS: Summary

PASDPAS
Mode adaptive sleep + poll (single) PAS / CP / INT auto-switching
QD Handling Cannot adapt to QD changes Switches mode based on QD
High Load Excessive polling wastes CPU OL → INT transition protects CPU
Ultra-low Latency Repeated oversleep at timer floor PAS → CP eliminates sleep overhead
Scope Single workload Multi-tenant, dynamic workloads

Next Steps

Now that you understand DPAS's design, install the kernel on a QEMU VM and benchmark performance.

DPAS Kernel Lab → Benchmark INT/CP/PAS/DPAS across 4 modes on a QEMU VM

Paper: DPAS: A Prompt, Accurate and Safe I/O Completion Method for SSDs (USENIX FAST '26, Seo et al.)