CVE-2026-23294
Publication date:
25/03/2026
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
bpf: Fix race in devmap on PREEMPT_RT<br />
<br />
On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be<br />
accessed concurrently by multiple preemptible tasks on the same CPU.<br />
<br />
The original code assumes bq_enqueue() and __dev_flush() run atomically<br />
with respect to each other on the same CPU, relying on<br />
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,<br />
local_bh_disable() only calls migrate_disable() (when<br />
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable<br />
preemption, which allows CFS scheduling to preempt a task during<br />
bq_xmit_all(), enabling another task on the same CPU to enter<br />
bq_enqueue() and operate on the same per-CPU bq concurrently.<br />
<br />
This leads to several races:<br />
<br />
1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots<br />
cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames.<br />
If preempted after the snapshot, a second task can call bq_enqueue()<br />
-> bq_xmit_all() on the same bq, transmitting (and freeing) the<br />
same frames. When the first task resumes, it operates on stale<br />
pointers in bq->q[], causing use-after-free.<br />
<br />
2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying<br />
bq->count and bq->q[] while bq_xmit_all() is reading them.<br />
<br />
3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and<br />
bq->xdp_prog after bq_xmit_all(). If preempted between<br />
bq_xmit_all() return and bq->dev_rx = NULL, a preempting<br />
bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to<br />
the flush_list, and enqueues a frame. When __dev_flush() resumes,<br />
it clears dev_rx and removes bq from the flush_list, orphaning the<br />
newly enqueued frame.<br />
<br />
4. __list_del_clearprev() on flush_node: similar to the cpumap race,<br />
both tasks can call __list_del_clearprev() on the same flush_node,<br />
the second dereferences the prev pointer already set to NULL.<br />
<br />
The race between task A (__dev_flush -> bq_xmit_all) and task B<br />
(bq_enqueue -> bq_xmit_all) on the same CPU:<br />
<br />
Task A (xdp_do_flush) Task B (ndo_xdp_xmit redirect)<br />
---------------------- --------------------------------<br />
__dev_flush(flush_list)<br />
bq_xmit_all(bq)<br />
cnt = bq->count /* e.g. 16 */<br />
/* start iterating bq->q[] */<br />
<br />
bq_enqueue(dev, xdpf)<br />
bq->count == DEV_MAP_BULK_SIZE<br />
bq_xmit_all(bq, 0)<br />
cnt = bq->count /* same 16! */<br />
ndo_xdp_xmit(bq->q[])<br />
/* frames freed by driver */<br />
bq->count = 0<br />
<br />
ndo_xdp_xmit(bq->q[])<br />
/* use-after-free: frames already freed! */<br />
<br />
Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring<br />
it in bq_enqueue() and __dev_flush(). These paths already run under<br />
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is<br />
a pure annotation with no overhead, and on PREEMPT_RT provides a<br />
per-CPU sleeping lock that serializes access to the bq.
Severity CVSS v4.0: Pending analysis
Last modification:
02/04/2026