CVE-2024-53079
Publication date:
19/11/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
mm/thp: fix deferred split unqueue naming and locking<br />
<br />
Recent changes are putting more pressure on THP deferred split queues:<br />
under load revealing long-standing races, causing list_del corruptions,<br />
"Bad page state"s and worse (I keep BUGs in both of those, so usually<br />
don&#39;t get to see how badly they end up without). The relevant recent<br />
changes being 6.8&#39;s mTHP, 6.10&#39;s mTHP swapout, and 6.12&#39;s mTHP swapin,<br />
improved swap allocation, and underused THP splitting.<br />
<br />
Before fixing locking: rename misleading folio_undo_large_rmappable(),<br />
which does not undo large_rmappable, to folio_unqueue_deferred_split(),<br />
which is what it does. But that and its out-of-line __callee are mm<br />
internals of very limited usability: add comment and WARN_ON_ONCEs to<br />
check usage; and return a bool to say if a deferred split was unqueued,<br />
which can then be used in WARN_ON_ONCEs around safety checks (sparing<br />
callers the arcane conditionals in __folio_unqueue_deferred_split()).<br />
<br />
Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all<br />
of whose callers now call it beforehand (and if any forget then bad_page()<br />
will tell) - except for its caller put_pages_list(), which itself no<br />
longer has any callers (and will be deleted separately).<br />
<br />
Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0<br />
without checking and unqueueing a THP folio from deferred split list;<br />
which is unfortunate, since the split_queue_lock depends on the memcg<br />
(when memcg is enabled); so swapout has been unqueueing such THPs later,<br />
when freeing the folio, using the pgdat&#39;s lock instead: potentially<br />
corrupting the memcg&#39;s list. __remove_mapping() has frozen refcount to 0<br />
here, so no problem with calling folio_unqueue_deferred_split() before<br />
resetting memcg_data.<br />
<br />
That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split<br />
shrinker memcg aware"): which included a check on swapcache before adding<br />
to deferred queue, but no check on deferred queue before adding THP to<br />
swapcache. That worked fine with the usual sequence of events in reclaim<br />
(though there were a couple of rare ways in which a THP on deferred queue<br />
could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split<br />
underused THPs") avoids splitting underused THPs in reclaim, which makes<br />
swapcache THPs on deferred queue commonplace.<br />
<br />
Keep the check on swapcache before adding to deferred queue? Yes: it is<br />
no longer essential, but preserves the existing behaviour, and is likely<br />
to be a worthwhile optimization (vmstat showed much more traffic on the<br />
queue under swapping load if the check was removed); update its comment.<br />
<br />
Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing<br />
folio->memcg_data without checking and unqueueing a THP folio from the<br />
deferred list, sometimes corrupting "from" memcg&#39;s list, like swapout. <br />
Refcount is non-zero here, so folio_unqueue_deferred_split() can only be<br />
used in a WARN_ON_ONCE to validate the fix, which must be done earlier:<br />
mem_cgroup_move_charge_pte_range() first try to split the THP (splitting<br />
of course unqueues), or skip it if that fails. Not ideal, but moving<br />
charge has been requested, and khugepaged should repair the THP later:<br />
nobody wants new custom unqueueing code just for this deprecated case.<br />
<br />
The 87eaceb3faa5 commit did have the code to move from one deferred list<br />
to another (but was not conscious of its unsafety while refcount non-0);<br />
but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don&#39;t need care<br />
deferred split queue in memcg charge move path"), which argued that the<br />
existence of a PMD mapping guarantees that the THP cannot be on a deferred<br />
list. As above, false in rare cases, and now commonly false.<br />
<br />
Backport to 6.11 should be straightforward. Earlier backports must take<br />
care that other _deferred_list fixes and dependencies are included. There<br />
is not a strong case for backports, but they can fix cornercases.
Severity CVSS v4.0: Pending analysis
Last modification:
01/10/2025