CVE-2025-39886
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
23/09/2025
Last modified:
23/09/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()<br />
<br />
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can<br />
cause various locking issues; see the following stack trace (edited for<br />
style) as one example:<br />
<br />
...<br />
[10.011566] do_raw_spin_lock.cold<br />
[10.011570] try_to_wake_up (5) double-acquiring the same<br />
[10.011575] kick_pool rq_lock, causing a hardlockup<br />
[10.011579] __queue_work<br />
[10.011582] queue_work_on<br />
[10.011585] kernfs_notify<br />
[10.011589] cgroup_file_notify<br />
[10.011593] try_charge_memcg (4) memcg accounting raises an<br />
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event<br />
[10.011599] obj_cgroup_charge_account<br />
[10.011600] __memcg_slab_post_alloc_hook<br />
[10.011603] __kmalloc_node_noprof<br />
...<br />
[10.011611] bpf_map_kmalloc_node<br />
[10.011612] __bpf_async_init<br />
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()<br />
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable<br />
[10.011619] bpf__sched_ext_ops_runnable<br />
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held<br />
[10.011622] enqueue_task<br />
[10.011626] ttwu_do_activate<br />
[10.011629] sched_ttwu_pending (1) grabs rq_lock<br />
...<br />
<br />
The above was reproduced on bpf-next (b338cf849ec8) by modifying<br />
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during<br />
ops.runnable(), and hacking the memcg accounting code a bit to make<br />
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.<br />
<br />
We have also run into other similar variants (both internally and on<br />
bpf-next), including double-acquiring cgroup_file_kn_lock, the same<br />
worker_pool::lock, etc.<br />
<br />
As suggested by Shakeel, fix this by using __GFP_HIGH instead of<br />
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()<br />
raises an MEMCG_MAX event, we call __memcg_memory_event() with<br />
@allow_spinning=false and avoid calling cgroup_file_notify() there.<br />
<br />
Depends on mm patch<br />
"memcg: skip cgroup_file_notify if spinning is not allowed":<br />
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/<br />
<br />
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/<br />
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/<br />
v1 approach:<br />
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/