CVE-2024-50066
Publication date:
23/10/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
mm/mremap: fix move_normal_pmd/retract_page_tables race<br />
<br />
In mremap(), move_page_tables() looks at the type of the PMD entry and the<br />
specified address range to figure out by which method the next chunk of<br />
page table entries should be moved.<br />
<br />
At that point, the mmap_lock is held in write mode, but no rmap locks are<br />
held yet. For PMD entries that point to page tables and are fully covered<br />
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,<br />
which first takes rmap locks, then does move_normal_pmd(). <br />
move_normal_pmd() takes the necessary page table locks at source and<br />
destination, then moves an entire page table from the source to the<br />
destination.<br />
<br />
The problem is: The rmap locks, which protect against concurrent page<br />
table removal by retract_page_tables() in the THP code, are only taken<br />
after the PMD entry has been read and it has been decided how to move it. <br />
So we can race as follows (with two processes that have mappings of the<br />
same tmpfs file that is stored on a tmpfs mount with huge=advise); note<br />
that process A accesses page tables through the MM while process B does it<br />
through the file rmap:<br />
<br />
process A process B<br />
========= =========<br />
mremap<br />
mremap_to<br />
move_vma<br />
move_page_tables<br />
get_old_pmd<br />
alloc_new_pmd<br />
*** PREEMPT ***<br />
madvise(MADV_COLLAPSE)<br />
do_madvise<br />
madvise_walk_vmas<br />
madvise_vma_behavior<br />
madvise_collapse<br />
hpage_collapse_scan_file<br />
collapse_file<br />
retract_page_tables<br />
i_mmap_lock_read(mapping)<br />
pmdp_collapse_flush<br />
i_mmap_unlock_read(mapping)<br />
move_pgt_entry(NORMAL_PMD, ...)<br />
take_rmap_locks<br />
move_normal_pmd<br />
drop_rmap_locks<br />
<br />
When this happens, move_normal_pmd() can end up creating bogus PMD entries<br />
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect<br />
depends on arch-specific and machine-specific details; on x86, you can end<br />
up with physical page 0 mapped as a page table, which is likely<br />
exploitable for user->kernel privilege escalation.<br />
<br />
Fix the race by letting process B recheck that the PMD still points to a<br />
page table after the rmap locks have been taken. Otherwise, we bail and<br />
let the caller fall back to the PTE-level copying path, which will then<br />
bail immediately at the pmd_none() check.<br />
<br />
Bug reachability: Reaching this bug requires that you can create<br />
shmem/file THP mappings - anonymous THP uses different code that doesn&#39;t<br />
zap stuff under rmap locks. File THP is gated on an experimental config<br />
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need<br />
shmem THP to hit this bug. As far as I know, getting shmem THP normally<br />
requires that you can mount your own tmpfs with the right mount flags,<br />
which would require creating your own user+mount namespace; though I don&#39;t<br />
know if some distros maybe enable shmem THP by default or something like<br />
that.<br />
<br />
Bug impact: This issue can likely be used for user->kernel privilege<br />
escalation when it is reachable.
Severity CVSS v4.0: Pending analysis
Last modification:
07/03/2025