CVE-2021-47209

Severity CVSS v4.0:
Pending analysis
Type:
CWE-416 Use After Free
Publication date:
10/04/2024
Last modified:
27/03/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> sched/fair: Prevent dead task groups from regaining cfs_rq&amp;#39;s<br /> <br /> Kevin is reporting crashes which point to a use-after-free of a cfs_rq<br /> in update_blocked_averages(). Initial debugging revealed that we&amp;#39;ve<br /> live cfs_rq&amp;#39;s (on_list=1) in an about to be kfree()&amp;#39;d task group in<br /> free_fair_sched_group(). However, it was unclear how that can happen.<br /> <br /> His kernel config happened to lead to a layout of struct sched_entity<br /> that put the &amp;#39;my_q&amp;#39; member directly into the middle of the object<br /> which makes it incidentally overlap with SLUB&amp;#39;s freelist pointer.<br /> That, in combination with SLAB_FREELIST_HARDENED&amp;#39;s freelist pointer<br /> mangling, leads to a reliable access violation in form of a #GP which<br /> made the UAF fail fast.<br /> <br /> Michal seems to have run into the same issue[1]. He already correctly<br /> diagnosed that commit a7b359fc6a37 ("sched/fair: Correctly insert<br /> cfs_rq&amp;#39;s to list on unthrottle") is causing the preconditions for the<br /> UAF to happen by re-adding cfs_rq&amp;#39;s also to task groups that have no<br /> more running tasks, i.e. also to dead ones. His analysis, however,<br /> misses the real root cause and it cannot be seen from the crash<br /> backtrace only, as the real offender is tg_unthrottle_up() getting<br /> called via sched_cfs_period_timer() via the timer interrupt at an<br /> inconvenient time.<br /> <br /> When unregister_fair_sched_group() unlinks all cfs_rq&amp;#39;s from the dying<br /> task group, it doesn&amp;#39;t protect itself from getting interrupted. If the<br /> timer interrupt triggers while we iterate over all CPUs or after<br /> unregister_fair_sched_group() has finished but prior to unlinking the<br /> task group, sched_cfs_period_timer() will execute and walk the list of<br /> task groups, trying to unthrottle cfs_rq&amp;#39;s, i.e. re-add them to the<br /> dying task group. These will later -- in free_fair_sched_group() -- be<br /> kfree()&amp;#39;ed while still being linked, leading to the fireworks Kevin<br /> and Michal are seeing.<br /> <br /> To fix this race, ensure the dying task group gets unlinked first.<br /> However, simply switching the order of unregistering and unlinking the<br /> task group isn&amp;#39;t sufficient, as concurrent RCU walkers might still see<br /> it, as can be seen below:<br /> <br /> CPU1: CPU2:<br /> : timer IRQ:<br /> : do_sched_cfs_period_timer():<br /> : :<br /> : distribute_cfs_runtime():<br /> : rcu_read_lock();<br /> : :<br /> : unthrottle_cfs_rq():<br /> sched_offline_group(): :<br /> : walk_tg_tree_from(…,tg_unthrottle_up,…):<br /> list_del_rcu(&amp;tg-&gt;list); :<br /> (1) : list_for_each_entry_rcu(child, &amp;parent-&gt;children, siblings)<br /> : :<br /> (2) list_del_rcu(&amp;tg-&gt;siblings); :<br /> : tg_unthrottle_up():<br /> unregister_fair_sched_group(): struct cfs_rq *cfs_rq = tg-&gt;cfs_rq[cpu_of(rq)];<br /> : :<br /> list_del_leaf_cfs_rq(tg-&gt;cfs_rq[cpu]); :<br /> : :<br /> : if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq-&gt;nr_running)<br /> (3) : list_add_leaf_cfs_rq(cfs_rq);<br /> : :<br /> : :<br /> : :<br /> : :<br /> : <br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.13 (including) 5.15.5 (excluding)