CVE-2023-52934

Severity CVSS v4.0:
Pending analysis
Type:
CWE-362 Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition')
Publication date:
27/03/2025
Last modified:
28/10/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups<br /> <br /> In commit 34488399fa08 ("mm/madvise: add file and shmem support to<br /> MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():<br /> <br /> - if (!pmd_present(pmde))<br /> - return SCAN_PMD_NULL;<br /> + if (pmd_none(pmde))<br /> + return SCAN_PMD_NONE;<br /> <br /> This was for-use by MADV_COLLAPSE file/shmem codepaths, where<br /> MADV_COLLAPSE might identify a pte-mapped hugepage, only to have<br /> khugepaged race-in, free the pte table, and clear the pmd. Such codepaths<br /> include:<br /> <br /> A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER<br /> already in the pagecache.<br /> B) In retract_page_tables(), if we fail to grab mmap_lock for the target<br /> mm/address.<br /> <br /> In these cases, collapse_pte_mapped_thp() really does expect a none (not<br /> just !present) pmd, and we want to suitably identify that case separate<br /> from the case where no pmd is found, or it&amp;#39;s a bad-pmd (of course, many<br /> things could happen once we drop mmap_lock, and the pmd could plausibly<br /> undergo multiple transitions due to intervening fault, split, etc). <br /> Regardless, the code is prepared install a huge-pmd only when the existing<br /> pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.<br /> <br /> However, the commit introduces a logical hole; namely, that we&amp;#39;ve allowed<br /> !none- &amp;&amp; !huge- &amp;&amp; !bad-pmds to be classified as genuine<br /> pte-table-mapping-pmds. One such example that could leak through are swap<br /> entries. The pmd values aren&amp;#39;t checked again before use in<br /> pte_offset_map_lock(), which is expecting nothing less than a genuine<br /> pte-table-mapping-pmd.<br /> <br /> We want to put back the !pmd_present() check (below the pmd_none() check),<br /> but need to be careful to deal with subtleties in pmd transitions and<br /> treatments by various arch.<br /> <br /> The issue is that __split_huge_pmd_locked() temporarily clears the present<br /> bit (or otherwise marks the entry as invalid), but pmd_present() and<br /> pmd_trans_huge() still need to return true while the pmd is in this<br /> transitory state. For example, x86&amp;#39;s pmd_present() also checks the<br /> _PAGE_PSE , riscv&amp;#39;s version also checks the _PAGE_LEAF bit, and arm64 also<br /> checks a PMD_PRESENT_INVALID bit.<br /> <br /> Covering all 4 cases for x86 (all checks done on the same pmd value):<br /> <br /> 1) pmd_present() &amp;&amp; pmd_trans_huge()<br /> All we actually know here is that the PSE bit is set. Either:<br /> a) We aren&amp;#39;t racing with __split_huge_page(), and PRESENT or PROTNONE<br /> is set.<br /> =&gt; huge-pmd<br /> b) We are currently racing with __split_huge_page(). The danger here<br /> is that we proceed as-if we have a huge-pmd, but really we are<br /> looking at a pte-mapping-pmd. So, what is the risk of this<br /> danger?<br /> <br /> The only relevant path is:<br /> <br /> madvise_collapse() -&gt; collapse_pte_mapped_thp()<br /> <br /> Where we might just incorrectly report back "success", when really<br /> the memory isn&amp;#39;t pmd-backed. This is fine, since split could<br /> happen immediately after (actually) successful madvise_collapse().<br /> So, it should be safe to just assume huge-pmd here.<br /> <br /> 2) pmd_present() &amp;&amp; !pmd_trans_huge()<br /> Either:<br /> a) PSE not set and either PRESENT or PROTNONE is.<br /> =&gt; pte-table-mapping pmd (or PROT_NONE)<br /> b) devmap. This routine can be called immediately after<br /> unlocking/locking mmap_lock -- or called with no locks held (see<br /> khugepaged_scan_mm_slot()), so previous VMA checks have since been<br /> invalidated.<br /> <br /> 3) !pmd_present() &amp;&amp; pmd_trans_huge()<br /> Not possible.<br /> <br /> 4) !pmd_present() &amp;&amp; !pmd_trans_huge()<br /> Neither PRESENT nor PROTNONE set<br /> =&gt; not present<br /> <br /> I&amp;#39;ve checked all archs that implement pmd_trans_huge() (arm64, riscv,<br /> powerpc, longarch, x86, mips, s390) and this logic roughly translates<br /> (though devmap treatment is unique to x86 and powerpc, and (3) doesn&amp;#39;t<br /> necessarily hold in general -- but that doesn&amp;#39;t matter since<br /> !pmd_present() always takes failure path).<br /> <br /> Also, add a comment above find_pmd_or_thp_or_none()<br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.1 (including) 6.1.11 (excluding)
cpe:2.3:o:linux:linux_kernel:6.2:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.2:rc2:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.2:rc3:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.2:rc4:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.2:rc5:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.2:rc6:*:*:*:*:*:*