The Evolution of I/O Completion

From Interrupts to Polling, and Adaptive Techniques

Background Reading
한국어
← Back to DPAS Lab Guide

1. The Problem: Why Does I/O Completion Matter?

When an application reads from or writes to a disk, how it gets notified of completion after submitting a request is what we call I/O completion.

In the HDD era, a single disk I/O took several milliseconds (ms), so differences of a few microseconds (us) due to the completion notification method were negligible. However, modern NVMe SSDs process a single I/O in 10 to 100 microseconds (us).

Key Numbers
DeviceI/O latencyCompletion overhead ratio
HDD~5,000 us<0.1% (negligible)
SATA SSD~100 us~5%
NVMe SSD (TLC)~15 us~30%
Intel Optane (3D XPoint)~10 us~50%

As devices get faster, the proportion of software overhead in the total I/O time increases dramatically. Optimizing the I/O completion method is now synonymous with optimizing overall I/O performance.

2. Three I/O Completion Modes

Interrupt (INT)

After submitting an I/O request, the CPU is released and the thread sleeps

The device wakes it up via an interrupt upon completion

CPU efficient

High wake-up cost

Continuous Polling (CP)

After submitting an I/O request, continuously checks (busy-wait)

Can detect completion immediately

Lowest latency

100% CPU utilization

Hybrid Polling (PAS/DPAS)

Sleeps for a period, then starts polling

Dynamically switches modes based on conditions

Balance between latency and CPU

Why Can't We Just Use One?

3. History of Polling Support in the Linux Kernel

2015 (Linux 4.4)

Introduction of the poll_queues parameter in the NVMe driver. Dedicated hardware queues separated for polling.

2017 (Linux 4.12)

Addition of the blk_poll() API to the block layer. The io_poll sysfs interface appeared.

2018 (Linux 4.19)

Hybrid polling introduced (io_poll_delay). Sleeps for a set duration before starting to poll.

2019 (Linux 5.0)

io_uring appeared. The new standard for asynchronous I/O. Polling support via the IORING_SETUP_IOPOLL flag.

2021 (Linux 5.13)

The last period when preadv2(RWF_HIPRI)-based synchronous Direct I/O polling worked reliably.

2022 (Linux 5.19)

Synchronous DIO polling removed. Polling transitioned to io_uring only. The sync polling path via RWF_HIPRI was deleted.

2024 (Linux 6.x)

io_uring established as the primary interface for polling. HP (Hewlett Packard Enterprise) adopted an io_uring-based storage stack in their products, increasing industry interest.

Why Was It Removed in 5.19? Synchronous DIO polling (preadv2 + RWF_HIPRI) had few users relative to the kernel maintenance burden. As io_uring provided a more efficient polling interface, the redundant path was removed as part of cleanup. However, this decision did not mean "polling is unnecessary" — rather, it meant "the polling interface was consolidated into io_uring."

4. USENIX FAST '26: The Revival of I/O Completion Research

At USENIX FAST '26 (File and Storage Technologies conference), held in February 2026, multiple papers on I/O completion optimization were presented, demonstrating that research in this area is once again thriving.

DPAS (Seo et al.)

"DPAS: A Prompt, Accurate and Safe I/O Completion Method for SSDs"

UnICom (Pan et al.)

"UnICom: A Universally High-Performant I/O Completion Mechanism for Modern Computer Systems"

Key Differences Between the Two Approaches

AspectDPASUnICom
Mode decision signalI/O queue depth (qd) — direct counting of I/O activityRun queue task count — indirect estimation of CPU contention
Polling entityThe thread that issued the I/O itselfDedicated completion thread (on another core)
Kernel modification scopeBlock layer + NVMe driverScheduler + separate kernel module
Single I/O + multiple C-threadsqd=1 → CP (optimal)task>1 → TagSched-TagPoll (unnecessary overhead)
What You Will Experience in This Lab You will directly observe the mode transitions of DPAS. With a single job (QD=1), you can see the transition from PAS to CP, and with multiple jobs (QD>1), the transition from PAS to OL to INT, using the switch_stat sysfs interface.

5. io_uring and the Future of Polling

io_uring, introduced in Linux 5.0, has now established itself as a core Linux I/O interface.

What is io_uring?

io_uring is an asynchronous I/O framework that places shared ring buffers (Submission Queue, Completion Queue) between the kernel and userspace, allowing I/O requests to be submitted and completions to be checked without system calls.

Traditional:  App → syscall → kernel → device → interrupt → kernel → syscall return → App

io_uring:     App → write to SQ → kernel processes → write to CQ → App checks CQ
              (minimized syscalls, batch processing possible)

Industry Adoption Trends

Relationship with Polling

io_uring's IORING_SETUP_IOPOLL mode checks completion via polling. After synchronous DIO polling was removed in 5.19, io_uring became the only official interface for polling.

If adaptive completion techniques like DPAS are integrated into io_uring's polling path, it would become possible at the kernel level to automatically select the optimal completion mode based on the workload. This is being discussed as a promising path toward kernel mainline adoption.

6. Why Kernel 5.18?

The reasons for using kernel 5.18 in this lab are as follows:

  1. Synchronous DIO polling support: This is the last kernel where sync polling via preadv2(RWF_HIPRI) works. Starting from 5.19, this path was removed.
  2. DPAS patch compatibility: DPAS was developed based on the 5.18 block layer, and the patch applies cleanly on this kernel.
  3. QEMU guest independence: Since we use kernel 5.18 inside a VM, the lab can be conducted regardless of the host OS kernel version. The same experimental environment can be set up on macOS, Windows, or the latest Linux.
Guest (kernel 5.18)                    Host (kernel 5.19+, macOS, Windows...)
─────────────────                      ──────────────────────────────────────
App: preadv2(RWF_HIPRI)
  │
Guest NVMe driver: sets REQ_POLLED
  │
Guest: poll CQ (busy-wait)  ◄── completed within guest
  │                                    ▲
Virtual NVMe CQ ◄──── QEMU writes CQ entry ◄── Host I/O completed
Summary 5.18 is both "the last kernel where sync DIO polling works" and "the kernel where the DPAS patch can be applied." Since it runs inside a QEMU VM, polling experiments can be conducted regardless of the host environment.

7. Further Reading

Papers

Kernel Documentation and Code

Background Knowledge