Instituto Nacional de ciberseguridad. Sección Incibe
Instituto Nacional de Ciberseguridad. Sección INCIBE-CERT

CVE-2025-39844

Gravedad:
Pendiente de análisis
Tipo:
No Disponible / Otro tipo
Fecha de publicación:
19/09/2025
Última modificación:
19/09/2025

Descripción

*** Pendiente de traducción *** In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> mm: move page table sync declarations to linux/pgtable.h<br /> <br /> During our internal testing, we started observing intermittent boot<br /> failures when the machine uses 4-level paging and has a large amount of<br /> persistent memory:<br /> <br /> BUG: unable to handle page fault for address: ffffe70000000034<br /> #PF: supervisor write access in kernel mode<br /> #PF: error_code(0x0002) - not-present page<br /> PGD 0 P4D 0 <br /> Oops: 0002 [#1] SMP NOPTI<br /> RIP: 0010:__init_single_page+0x9/0x6d<br /> Call Trace:<br /> <br /> __init_zone_device_page+0x17/0x5d<br /> memmap_init_zone_device+0x154/0x1bb<br /> pagemap_range+0x2e0/0x40f<br /> memremap_pages+0x10b/0x2f0<br /> devm_memremap_pages+0x1e/0x60<br /> dev_dax_probe+0xce/0x2ec [device_dax]<br /> dax_bus_probe+0x6d/0xc9<br /> [... snip ...]<br /> <br /> <br /> It turns out that the kernel panics while initializing vmemmap (struct<br /> page array) when the vmemmap region spans two PGD entries, because the new<br /> PGD entry is only installed in init_mm.pgd, but not in the page tables of<br /> other tasks.<br /> <br /> And looking at __populate_section_memmap():<br /> if (vmemmap_can_optimize(altmap, pgmap)) <br /> // does not sync top level page tables<br /> r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);<br /> else <br /> // sync top level page tables in x86<br /> r = vmemmap_populate(start, end, nid, altmap);<br /> <br /> In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c<br /> synchronizes the top level page table (See commit 9b861528a801 ("x86-64,<br /> mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so<br /> that all tasks in the system can see the new vmemmap area.<br /> <br /> However, when vmemmap_can_optimize() returns true, the optimized path<br /> skips synchronization of top-level page tables. This is because<br /> vmemmap_populate_compound_pages() is implemented in core MM code, which<br /> does not handle synchronization of the top-level page tables. Instead,<br /> the core MM has historically relied on each architecture to perform this<br /> synchronization manually.<br /> <br /> We&amp;#39;re not the first party to encounter a crash caused by not-sync&amp;#39;d top<br /> level page tables: earlier this year, Gwan-gyeong Mun attempted to address<br /> the issue [1] [2] after hitting a kernel panic when x86 code accessed the<br /> vmemmap area before the corresponding top-level entries were synced. At<br /> that time, the issue was believed to be triggered only when struct page<br /> was enlarged for debugging purposes, and the patch did not get further<br /> updates.<br /> <br /> It turns out that current approach of relying on each arch to handle the<br /> page table sync manually is fragile because 1) it&amp;#39;s easy to forget to sync<br /> the top level page table, and 2) it&amp;#39;s also easy to overlook that the<br /> kernel should not access the vmemmap and direct mapping areas before the<br /> sync.<br /> <br /> # The solution: Make page table sync more code robust and harder to miss<br /> <br /> To address this, Dave Hansen suggested [3] [4] introducing<br /> {pgd,p4d}_populate_kernel() for updating kernel portion of the page tables<br /> and allow each architecture to explicitly perform synchronization when<br /> installing top-level entries. With this approach, we no longer need to<br /> worry about missing the sync step, reducing the risk of future<br /> regressions.<br /> <br /> The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,<br /> PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by<br /> vmalloc and ioremap to synchronize page tables.<br /> <br /> pgd_populate_kernel() looks like this:<br /> static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,<br /> p4d_t *p4d)<br /> {<br /> pgd_populate(&amp;init_mm, pgd, p4d);<br /> if (ARCH_PAGE_TABLE_SYNC_MASK &amp; PGTBL_PGD_MODIFIED)<br /> arch_sync_kernel_mappings(addr, addr);<br /> }<br /> <br /> It is worth noting that vmalloc() and apply_to_range() carefully<br /> synchronizes page tables by calling p*d_alloc_track() and<br /> arch_sync_kernel_mappings(), and thus they are not affected by<br /> ---truncated---

Impacto