CVE-2025-37964
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
20/05/2025
Last modified:
21/05/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped<br />
<br />
tl;dr: There is a window in the mm switching code where the new CR3 is<br />
set and the CPU should be getting TLB flushes for the new mm. But<br />
should_flush_tlb() has a bug and suppresses the flush. Fix it by<br />
widening the window where should_flush_tlb() sends an IPI.<br />
<br />
Long Version:<br />
<br />
=== History ===<br />
<br />
There were a few things leading up to this.<br />
<br />
First, updating mm_cpumask() was observed to be too expensive, so it was<br />
made lazier. But being lazy caused too many unnecessary IPIs to CPUs<br />
due to the now-lazy mm_cpumask(). So code was added to cull<br />
mm_cpumask() periodically[2]. But that culling was a bit too aggressive<br />
and skipped sending TLB flushes to CPUs that need them. So here we are<br />
again.<br />
<br />
=== Problem ===<br />
<br />
The too-aggressive code in should_flush_tlb() strikes in this window:<br />
<br />
// Turn on IPIs for this CPU/mm combination, but only<br />
// if should_flush_tlb() agrees:<br />
cpumask_set_cpu(cpu, mm_cpumask(next));<br />
<br />
next_tlb_gen = atomic64_read(&next->context.tlb_gen);<br />
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);<br />
load_new_mm_cr3(need_flush);<br />
// ^ After &#39;need_flush&#39; is set to false, IPIs *MUST*<br />
// be sent to this CPU and not be ignored.<br />
<br />
this_cpu_write(cpu_tlbstate.loaded_mm, next);<br />
// ^ Not until this point does should_flush_tlb()<br />
// become true!<br />
<br />
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()<br />
and writing to &#39;loaded_mm&#39;, which is a window where they should not be<br />
suppressed. Whoops.<br />
<br />
=== Solution ===<br />
<br />
Thankfully, the fuzzy "just about to write CR3" window is already marked<br />
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in<br />
should_flush_tlb() is sufficient to ensure that the CPU is targeted with<br />
an IPI.<br />
<br />
This will cause more TLB flush IPIs. But the window is relatively small<br />
and I do not expect this to cause any kind of measurable performance<br />
impact.<br />
<br />
Update the comment where LOADED_MM_SWITCHING is written since it grew<br />
yet another user.<br />
<br />
Peter Z also raised a concern that should_flush_tlb() might not observe<br />
&#39;loaded_mm&#39; and &#39;is_lazy&#39; in the same order that switch_mm_irqs_off()<br />
writes them. Add a barrier to ensure that they are observed in the<br />
order they are written.
Impact
References to Advisories, Solutions, and Tools
- https://git.kernel.org/stable/c/02ad4ce144bd27f71f583f667fdf3b3ba0753477
- https://git.kernel.org/stable/c/12f703811af043d32b1c8a30001b2fa04d5cd0ac
- https://git.kernel.org/stable/c/399ec9ca8fc4999e676ff89a90184ec40031cf59
- https://git.kernel.org/stable/c/d41072906abec8bb8e01ed16afefbaa558908c89
- https://git.kernel.org/stable/c/d87392094f96e162fa5fa5a8640d70cc0952806f
- https://git.kernel.org/stable/c/fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a