Part 4: DPAS Introduction — Systems Technology 2026

1. PAS Limitations: Single Mode

PAS effectively adapts sleep duration on every I/O, but it always uses the "sleep + poll" pattern. In practice, there are scenarios where this pattern is suboptimal:

QD = 1 (Single I/O, Low Load)

When queue depth is 1, only one I/O is processed at a time. If PAS's sleep time is shorter than the device processing time, a brief busy-wait after waking detects completion. → PAS works well.

QD > 1 (Concurrent I/O, High Load)

With multiple concurrent I/Os, effective device latency increases and polling CPU cost grows. In this case:

Moderately high QD: Classic Polling (CP) may be more efficient — no sleep overhead, immediate completion detection
Very high QD: All CPUs consumed by polling degrades throughput → Interrupt (INT) is appropriate

Timer Floor

PAS's sleep duration cannot go below d_init (the minimum). If the device is so fast that PAS's sleep keeps hitting this floor, "oversleep despite already being at minimum" repeats. This signals that sleep is fundamentally unnecessary — Classic Polling is more appropriate.

Fundamental Limitation of Single Mode PAS is optimized for stable I/O latency at QD=1. However, real systems have dynamically changing loads, so a single completion technique cannot handle all scenarios. → DPAS automatically selects the optimal mode at runtime.

2. DPAS State Machine

DPAS (Dynamic PAS) manages 4 modes as a state machine. Every N_eval = 100 I/Os, it evaluates observed Queue Depth (QD) and Timer Floor (tf) count to decide mode transitions.

Four Modes

Mode	ID	Behavior	Suitable For
PAS	2	adaptive sleep + poll (default mode)	QD=1, stable I/O latency
CP	1	classic polling (busy-wait)	QD=1 with very fast device
OL	3	overload state — evaluates QD to branch to PAS or INT	Transitional when PAS timer hits floor
INT	0	interrupt-based (sleep + interrupt wake)	Very high QD where polling is inefficient

Transition Conditions

Transition	Condition	Interpretation
PAS → CP	`avg_QD = 1` && `param4 ≥ 1`	Low load (QD=1); switching to CP eliminates sleep overhead
PAS → OL	`tf > param1`	Timer floor exceeded — sleep keeps hitting minimum, effectively meaningless
CP → PAS	After 1000 I/Os	Return to PAS after a period in CP for re-evaluation
OL → PAS	`avg_QD ≤ param2`	Load has decreased → PAS is appropriate again
OL → INT	`avg_QD > param3`	Load is very high → abandon polling, use interrupts

Evaluation Period All transition decisions are made every N_eval = 100 I/Os (PAS, OL) or 1000 I/Os (CP). During this period, QD sum (qd_sum) and timer floor count (tf) are accumulated and averaged for the decision.

3. Key Metrics: QD and Timer Floor

Queue Depth (QD)

The number of I/O requests submitted to the device but not yet completed at a given point. DPAS tracks the average QD during the evaluation period at 10x scale (qd_sum × 10 / count).

avg_QD = 10 → actual QD ≈ 1 (single I/O level)
avg_QD > 10 → multiple concurrent I/Os (high load)

Timer Floor (tf)

The number of times PAS's sleep duration reached the minimum value (d_init).

// Kernel code (blk-mq.c)
dur = dur * adj / div;
if (dur < d_init) {
    dur = d_init;       // clamp to floor
    sc->tf++;           // increment timer floor counter
}

A high tf means PAS tries to reduce sleep but is already at the floor. If this persists (tf > param1), PAS is deemed no longer effective and transitions to OL.

What param1 = 0 means The default param1 = 0 means even a single timer floor event triggers an immediate OL transition. Increasing this value allows PAS to tolerate the timer floor state longer.

4. sysfs Parameters

DPAS transition thresholds can be adjusted at runtime via sysfs. In the kernel lab, you'll modify these parameters and observe behavior changes.

Parameter	Default	Role
`switch_param1`	0	PAS → OL threshold: transition to OL when `tf > param1`
`switch_param2`	10	OL → PAS threshold: return to PAS when `avg_QD ≤ param2`
`switch_param3`	10	OL → INT threshold: transition to INT when `avg_QD > param3`
`switch_param4`	1	PAS ↔ CP toggle: 1 enables, 0 disables

# sysfs paths (QEMU lab)
/sys/block/nvme0n1/queue/switch_param1
/sys/block/nvme0n1/queue/switch_param2
/sys/block/nvme0n1/queue/switch_param3
/sys/block/nvme0n1/queue/switch_param4

# Check per-mode I/O statistics
cat /sys/block/nvme0n1/queue/switch_stat
# → CPU[ 0] MODE[2] QD[ 1] param1: 0 param2: 10 param3: 10 polled io: 42 pas io: 958 ...

Reading switch_stat

MODE[0]=INT, MODE[1]=CP, MODE[2]=PAS, MODE[3]=OL
polled io: I/Os processed in CP mode
pas io: I/Os processed in PAS mode
ol io: I/Os processed in OL mode
int io: I/Os processed in INT mode

5. PAS vs DPAS: Summary

	PAS	DPAS
Mode	adaptive sleep + poll (single)	PAS / CP / INT auto-switching
QD Handling	Cannot adapt to QD changes	Switches mode based on QD
High Load	Excessive polling wastes CPU	OL → INT transition protects CPU
Ultra-low Latency	Repeated oversleep at timer floor	PAS → CP eliminates sleep overhead
Scope	Single workload	Multi-tenant, dynamic workloads

Next Steps

Now that you understand DPAS's design, install the kernel on a QEMU VM and benchmark performance.

DPAS Kernel Lab → Benchmark INT/CP/PAS/DPAS across 4 modes on a QEMU VM

Paper: DPAS: A Prompt, Accurate and Safe I/O Completion Method for SSDs (USENIX FAST '26, Seo et al.)