CVE-2025-39844
Fecha de publicación:
19/09/2025
*** Pendiente de traducción *** In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
mm: move page table sync declarations to linux/pgtable.h<br />
<br />
During our internal testing, we started observing intermittent boot<br />
failures when the machine uses 4-level paging and has a large amount of<br />
persistent memory:<br />
<br />
BUG: unable to handle page fault for address: ffffe70000000034<br />
#PF: supervisor write access in kernel mode<br />
#PF: error_code(0x0002) - not-present page<br />
PGD 0 P4D 0 <br />
Oops: 0002 [#1] SMP NOPTI<br />
RIP: 0010:__init_single_page+0x9/0x6d<br />
Call Trace:<br />
<br />
__init_zone_device_page+0x17/0x5d<br />
memmap_init_zone_device+0x154/0x1bb<br />
pagemap_range+0x2e0/0x40f<br />
memremap_pages+0x10b/0x2f0<br />
devm_memremap_pages+0x1e/0x60<br />
dev_dax_probe+0xce/0x2ec [device_dax]<br />
dax_bus_probe+0x6d/0xc9<br />
[... snip ...]<br />
<br />
<br />
It turns out that the kernel panics while initializing vmemmap (struct<br />
page array) when the vmemmap region spans two PGD entries, because the new<br />
PGD entry is only installed in init_mm.pgd, but not in the page tables of<br />
other tasks.<br />
<br />
And looking at __populate_section_memmap():<br />
if (vmemmap_can_optimize(altmap, pgmap)) <br />
// does not sync top level page tables<br />
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);<br />
else <br />
// sync top level page tables in x86<br />
r = vmemmap_populate(start, end, nid, altmap);<br />
<br />
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c<br />
synchronizes the top level page table (See commit 9b861528a801 ("x86-64,<br />
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so<br />
that all tasks in the system can see the new vmemmap area.<br />
<br />
However, when vmemmap_can_optimize() returns true, the optimized path<br />
skips synchronization of top-level page tables. This is because<br />
vmemmap_populate_compound_pages() is implemented in core MM code, which<br />
does not handle synchronization of the top-level page tables. Instead,<br />
the core MM has historically relied on each architecture to perform this<br />
synchronization manually.<br />
<br />
We&#39;re not the first party to encounter a crash caused by not-sync&#39;d top<br />
level page tables: earlier this year, Gwan-gyeong Mun attempted to address<br />
the issue [1] [2] after hitting a kernel panic when x86 code accessed the<br />
vmemmap area before the corresponding top-level entries were synced. At<br />
that time, the issue was believed to be triggered only when struct page<br />
was enlarged for debugging purposes, and the patch did not get further<br />
updates.<br />
<br />
It turns out that current approach of relying on each arch to handle the<br />
page table sync manually is fragile because 1) it&#39;s easy to forget to sync<br />
the top level page table, and 2) it&#39;s also easy to overlook that the<br />
kernel should not access the vmemmap and direct mapping areas before the<br />
sync.<br />
<br />
# The solution: Make page table sync more code robust and harder to miss<br />
<br />
To address this, Dave Hansen suggested [3] [4] introducing<br />
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables<br />
and allow each architecture to explicitly perform synchronization when<br />
installing top-level entries. With this approach, we no longer need to<br />
worry about missing the sync step, reducing the risk of future<br />
regressions.<br />
<br />
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,<br />
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by<br />
vmalloc and ioremap to synchronize page tables.<br />
<br />
pgd_populate_kernel() looks like this:<br />
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,<br />
p4d_t *p4d)<br />
{<br />
pgd_populate(&init_mm, pgd, p4d);<br />
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)<br />
arch_sync_kernel_mappings(addr, addr);<br />
}<br />
<br />
It is worth noting that vmalloc() and apply_to_range() carefully<br />
synchronizes page tables by calling p*d_alloc_track() and<br />
arch_sync_kernel_mappings(), and thus they are not affected by<br />
---truncated---
Gravedad: Pendiente de análisis
Última modificación:
19/09/2025