CVE-2025-21825
Publication date:
06/03/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT<br />
<br />
During the update procedure, when overwrite element in a pre-allocated<br />
htab, the freeing of old_element is protected by the bucket lock. The<br />
reason why the bucket lock is necessary is that the old_element has<br />
already been stashed in htab->extra_elems after alloc_htab_elem()<br />
returns. If freeing the old_element after the bucket lock is unlocked,<br />
the stashed element may be reused by concurrent update procedure and the<br />
freeing of old_element will run concurrently with the reuse of the<br />
old_element. However, the invocation of check_and_free_fields() may<br />
acquire a spin-lock which violates the lockdep rule because its caller<br />
has already held a raw-spin-lock (bucket lock). The following warning<br />
will be reported when such race happens:<br />
<br />
BUG: scheduling while atomic: test_progs/676/0x00000003<br />
3 locks held by test_progs/676:<br />
#0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830<br />
#1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500<br />
#2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0<br />
Modules linked in: bpf_testmod(O)<br />
Preemption disabled at:<br />
[] htab_map_update_elem+0x293/0x1500<br />
CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11<br />
Tainted: [W]=WARN, [O]=OOT_MODULE<br />
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...<br />
Call Trace:<br />
<br />
dump_stack_lvl+0x57/0x70<br />
dump_stack+0x10/0x20<br />
__schedule_bug+0x120/0x170<br />
__schedule+0x300c/0x4800<br />
schedule_rtlock+0x37/0x60<br />
rtlock_slowlock_locked+0x6d9/0x54c0<br />
rt_spin_lock+0x168/0x230<br />
hrtimer_cancel_wait_running+0xe9/0x1b0<br />
hrtimer_cancel+0x24/0x30<br />
bpf_timer_delete_work+0x1d/0x40<br />
bpf_timer_cancel_and_free+0x5e/0x80<br />
bpf_obj_free_fields+0x262/0x4a0<br />
check_and_free_fields+0x1d0/0x280<br />
htab_map_update_elem+0x7fc/0x1500<br />
bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43<br />
bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e<br />
bpf_prog_test_run_syscall+0x322/0x830<br />
__sys_bpf+0x135d/0x3ca0<br />
__x64_sys_bpf+0x75/0xb0<br />
x64_sys_call+0x1b5/0xa10<br />
do_syscall_64+0x3b/0xc0<br />
entry_SYSCALL_64_after_hwframe+0x4b/0x53<br />
...<br />
<br />
<br />
It seems feasible to break the reuse and refill of per-cpu extra_elems<br />
into two independent parts: reuse the per-cpu extra_elems with bucket<br />
lock being held and refill the old_element as per-cpu extra_elems after<br />
the bucket lock is unlocked. However, it will make the concurrent<br />
overwrite procedures on the same CPU return unexpected -E2BIG error when<br />
the map is full.<br />
<br />
Therefore, the patch fixes the lock problem by breaking the cancelling<br />
of bpf_timer into two steps for PREEMPT_RT:<br />
1) use hrtimer_try_to_cancel() and check its return value<br />
2) if the timer is running, use hrtimer_cancel() through a kworker to<br />
cancel it again<br />
Considering that the current implementation of hrtimer_cancel() will try<br />
to acquire a being held softirq_expiry_lock when the current timer is<br />
running, these steps above are reasonable. However, it also has<br />
downside. When the timer is running, the cancelling of the timer is<br />
delayed when releasing the last map uref. The delay is also fixable<br />
(e.g., break the cancelling of bpf timer into two parts: one part in<br />
locked scope, another one in unlocked scope), it can be revised later if<br />
necessary.<br />
<br />
It is a bit hard to decide the right fix tag. One reason is that the<br />
problem depends on PREEMPT_RT which is enabled in v6.12. Considering the<br />
softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced<br />
in v5.15, the bpf_timer commit is used in the fixes tag and an extra<br />
depends-on tag is added to state the dependency on PREEMPT_RT.<br />
<br />
Depends-on: v6.12+ with PREEMPT_RT enabled
Severity CVSS v4.0: Pending analysis
Last modification:
06/03/2025