CVE-2025-38242

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
09/07/2025
Last modified:
09/07/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> mm: userfaultfd: fix race of userfaultfd_move and swap cache<br /> <br /> This commit fixes two kinds of races, they may have different results:<br /> <br /> Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same<br /> BUG_ON if the filemap lookup returned NULL and folio is added to swap<br /> cache after that.<br /> <br /> If another kind of race is triggered (folio changed after lookup) we<br /> may see RSS counter is corrupted:<br /> <br /> [ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0<br /> type:MM_ANONPAGES val:-1<br /> [ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0<br /> type:MM_SHMEMPAGES val:1<br /> <br /> Because the folio is being accounted to the wrong VMA.<br /> <br /> I&amp;#39;m not sure if there will be any data corruption though, seems no. <br /> The issues above are critical already.<br /> <br /> <br /> On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache<br /> lookup, and tries to move the found folio to the faulting vma. Currently,<br /> it relies on checking the PTE value to ensure that the moved folio still<br /> belongs to the src swap entry and that no new folio has been added to the<br /> swap cache, which turns out to be unreliable.<br /> <br /> While working and reviewing the swap table series with Barry, following<br /> existing races are observed and reproduced [1]:<br /> <br /> In the example below, move_pages_pte is moving src_pte to dst_pte, where<br /> src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the<br /> swap cache:<br /> <br /> CPU1 CPU2<br /> userfaultfd_move<br /> move_pages_pte()<br /> entry = pte_to_swp_entry(orig_src_pte);<br /> // Here it got entry = S1<br /> ... ...<br /> <br /> // folio A is a new allocated folio<br /> // and get installed into src_pte<br /> <br /> // src_pte now points to folio A, S1<br /> // has swap count == 0, it can be freed<br /> // by folio_swap_swap or swap<br /> // allocator&amp;#39;s reclaim.<br /> <br /> // folio B is a folio in another VMA.<br /> <br /> // S1 is freed, folio B can use it<br /> // for swap out with no problem.<br /> ...<br /> folio = filemap_get_folio(S1)<br /> // Got folio B here !!!<br /> ... ...<br /> <br /> // Now S1 is free to be used again.<br /> <br /> // Now src_pte is a swap entry PTE<br /> // holding S1 again.<br /> folio_trylock(folio)<br /> move_swap_pte<br /> double_pt_lock<br /> is_pte_pages_stable<br /> // Check passed because src_pte == S1<br /> folio_move_anon_rmap(...)<br /> // Moved invalid folio B here !!!<br /> <br /> The race window is very short and requires multiple collisions of multiple<br /> rare events, so it&amp;#39;s very unlikely to happen, but with a deliberately<br /> constructed reproducer and increased time window, it can be reproduced<br /> easily.<br /> <br /> This can be fixed by checking if the folio returned by filemap is the<br /> valid swap cache folio after acquiring the folio lock.<br /> <br /> Another similar race is possible: filemap_get_folio may return NULL, but<br /> folio (A) could be swapped in and then swapped out again using the same<br /> swap entry after the lookup. In such a case, folio (A) may remain in the<br /> swap cache, so it must be moved too:<br /> <br /> CPU1 CPU2<br /> userfaultfd_move<br /> move_pages_pte()<br /> entry = pte_to_swp_entry(orig_src_pte);<br /> // Here it got entry = S1, and S1 is not in swap cache<br /> folio = filemap_get<br /> ---truncated---

Impact