CVE-2022-49647
Publication date:
26/02/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
cgroup: Use separate src/dst nodes when preloading css_sets for migration<br />
<br />
Each cset (css_set) is pinned by its tasks. When we&#39;re moving tasks around<br />
across csets for a migration, we need to hold the source and destination<br />
csets to ensure that they don&#39;t go away while we&#39;re moving tasks about. This<br />
is done by linking cset->mg_preload_node on either the<br />
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the<br />
same cset->mg_preload_node for both the src and dst lists was deemed okay as<br />
a cset can&#39;t be both the source and destination at the same time.<br />
<br />
Unfortunately, this overloading becomes problematic when multiple tasks are<br />
involved in a migration and some of them are identity noop migrations while<br />
others are actually moving across cgroups. For example, this can happen with<br />
the following sequence on cgroup1:<br />
<br />
#1> mkdir -p /sys/fs/cgroup/misc/a/b<br />
#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs<br />
#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &<br />
#4> PID=$!<br />
#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks<br />
#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs<br />
<br />
the process including the group leader back into a. In this final migration,<br />
non-leader threads would be doing identity migration while the group leader<br />
is doing an actual one.<br />
<br />
After #3, let&#39;s say the whole process was in cset A, and that after #4, the<br />
leader moves to cset B. Then, during #6, the following happens:<br />
<br />
1. cgroup_migrate_add_src() is called on B for the leader.<br />
<br />
2. cgroup_migrate_add_src() is called on A for the other threads.<br />
<br />
3. cgroup_migrate_prepare_dst() is called. It scans the src list.<br />
<br />
4. It notices that B wants to migrate to A, so it tries to A to the dst<br />
list but realizes that its ->mg_preload_node is already busy.<br />
<br />
5. and then it notices A wants to migrate to A as it&#39;s an identity<br />
migration, it culls it by list_del_init()&#39;ing its ->mg_preload_node and<br />
putting references accordingly.<br />
<br />
6. The rest of migration takes place with B on the src list but nothing on<br />
the dst list.<br />
<br />
This means that A isn&#39;t held while migration is in progress. If all tasks<br />
leave A before the migration finishes and the incoming task pins it, the<br />
cset will be destroyed leading to use-after-free.<br />
<br />
This is caused by overloading cset->mg_preload_node for both src and dst<br />
preload lists. We wanted to exclude the cset from the src list but ended up<br />
inadvertently excluding it from the dst list too.<br />
<br />
This patch fixes the issue by separating out cset->mg_preload_node into<br />
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst<br />
preloadings don&#39;t interfere with each other.
Severity CVSS v4.0: Pending analysis
Last modification:
24/03/2025