CVE-2024-40918
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
12/07/2024
Last modified:
17/09/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
parisc: Try to fix random segmentation faults in package builds<br />
<br />
PA-RISC systems with PA8800 and PA8900 processors have had problems<br />
with random segmentation faults for many years. Systems with earlier<br />
processors are much more stable.<br />
<br />
Systems with PA8800 and PA8900 processors have a large L2 cache which<br />
needs per page flushing for decent performance when a large range is<br />
flushed. The combined cache in these systems is also more sensitive to<br />
non-equivalent aliases than the caches in earlier systems.<br />
<br />
The majority of random segmentation faults that I have looked at<br />
appear to be memory corruption in memory allocated using mmap and<br />
malloc.<br />
<br />
My first attempt at fixing the random faults didn&#39;t work. On<br />
reviewing the cache code, I realized that there were two issues<br />
which the existing code didn&#39;t handle correctly. Both relate<br />
to cache move-in. Another issue is that the present bit in PTEs<br />
is racy.<br />
<br />
1) PA-RISC caches have a mind of their own and they can speculatively<br />
load data and instructions for a page as long as there is a entry in<br />
the TLB for the page which allows move-in. TLBs are local to each<br />
CPU. Thus, the TLB entry for a page must be purged before flushing<br />
the page. This is particularly important on SMP systems.<br />
<br />
In some of the flush routines, the flush routine would be called<br />
and then the TLB entry would be purged. This was because the flush<br />
routine needed the TLB entry to do the flush.<br />
<br />
2) My initial approach to trying the fix the random faults was to<br />
try and use flush_cache_page_if_present for all flush operations.<br />
This actually made things worse and led to a couple of hardware<br />
lockups. It finally dawned on me that some lines weren&#39;t being<br />
flushed because the pte check code was racy. This resulted in<br />
random inequivalent mappings to physical pages.<br />
<br />
The __flush_cache_page tmpalias flush sets up its own TLB entry<br />
and it doesn&#39;t need the existing TLB entry. As long as we can find<br />
the pte pointer for the vm page, we can get the pfn and physical<br />
address of the page. We can also purge the TLB entry for the page<br />
before doing the flush. Further, __flush_cache_page uses a special<br />
TLB entry that inhibits cache move-in.<br />
<br />
When switching page mappings, we need to ensure that lines are<br />
removed from the cache. It is not sufficient to just flush the<br />
lines to memory as they may come back.<br />
<br />
This made it clear that we needed to implement all the required<br />
flush operations using tmpalias routines. This includes flushes<br />
for user and kernel pages.<br />
<br />
After modifying the code to use tmpalias flushes, it became clear<br />
that the random segmentation faults were not fully resolved. The<br />
frequency of faults was worse on systems with a 64 MB L2 (PA8900)<br />
and systems with more CPUs (rp4440).<br />
<br />
The warning that I added to flush_cache_page_if_present to detect<br />
pages that couldn&#39;t be flushed triggered frequently on some systems.<br />
<br />
Helge and I looked at the pages that couldn&#39;t be flushed and found<br />
that the PTE was either cleared or for a swap page. Ignoring pages<br />
that were swapped out seemed okay but pages with cleared PTEs seemed<br />
problematic.<br />
<br />
I looked at routines related to pte_clear and noticed ptep_clear_flush.<br />
The default implementation just flushes the TLB entry. However, it was<br />
obvious that on parisc we need to flush the cache page as well. If<br />
we don&#39;t flush the cache page, stale lines will be left in the cache<br />
and cause random corruption. Once a PTE is cleared, there is no way<br />
to find the physical address associated with the PTE and flush the<br />
associated page at a later time.<br />
<br />
I implemented an updated change with a parisc specific version of<br />
ptep_clear_flush. It fixed the random data corruption on Helge&#39;s rp4440<br />
and rp3440, as well as on my c8000.<br />
<br />
At this point, I realized that I could restore the code where we only<br />
flush in flush_cache_page_if_present if the page has been accessed.<br />
However, for this, we also need to flush the cache when the accessed<br />
bit is cleared in<br />
---truncated---
Impact
Base Score 3.x
6.30
Severity 3.x
MEDIUM
Vulnerable products and versions
| CPE | From | Up to |
|---|---|---|
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.6.35 (excluding) | |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.7 (including) | 6.9.6 (excluding) |
| cpe:2.3:o:linux:linux_kernel:6.10:rc1:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.10:rc2:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.10:rc3:*:*:*:*:*:* |
To consult the complete list of CPE names with products and versions, see this page
References to Advisories, Solutions, and Tools
- https://git.kernel.org/stable/c/5bf196f1936bf93df31112fbdfb78c03537c07b0
- https://git.kernel.org/stable/c/72d95924ee35c8cd16ef52f912483ee938a34d49
- https://git.kernel.org/stable/c/d66f2607d89f760cdffed88b22f309c895a2af20
- https://git.kernel.org/stable/c/5bf196f1936bf93df31112fbdfb78c03537c07b0
- https://git.kernel.org/stable/c/72d95924ee35c8cd16ef52f912483ee938a34d49
- https://git.kernel.org/stable/c/d66f2607d89f760cdffed88b22f309c895a2af20



