CVE-2022-48797
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
16/07/2024
Last modified:
16/07/2024
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
mm: don&#39;t try to NUMA-migrate COW pages that have other uses<br />
<br />
Oded Gabbay reports that enabling NUMA balancing causes corruption with<br />
his Gaudi accelerator test load:<br />
<br />
"All the details are in the bug, but the bottom line is that somehow,<br />
this patch causes corruption when the numa balancing feature is<br />
enabled AND we don&#39;t use process affinity AND we use GUP to pin pages<br />
so our accelerator can DMA to/from system memory.<br />
<br />
Either disabling numa balancing, using process affinity to bind to<br />
specific numa-node or reverting this patch causes the bug to<br />
disappear"<br />
<br />
and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()<br />
simplification").<br />
<br />
Now, the NUMA balancing shouldn&#39;t actually be changing the writability<br />
of a page, and as such shouldn&#39;t matter for COW. But it appears it<br />
does. Suspicious.<br />
<br />
However, regardless of that, the condition for enabling NUMA faults in<br />
change_pte_range() is nonsensical. It uses "page_mapcount(page)" to<br />
decide if a COW page should be NUMA-protected or not, and that makes<br />
absolutely no sense.<br />
<br />
The number of mappings a page has is irrelevant: not only does GUP get a<br />
reference to a page as in Oded&#39;s case, but the other mappings migth be<br />
paged out and the only reference to them would be in the page count.<br />
<br />
Since we should never try to NUMA-balance a page that we can&#39;t move<br />
anyway due to other references, just fix the code to use &#39;page_count()&#39;.<br />
Oded confirms that that fixes his issue.<br />
<br />
Now, this does imply that something in NUMA balancing ends up changing<br />
page protections (other than the obvious one of making the page<br />
inaccessible to get the NUMA faulting information). Otherwise the COW<br />
simplification wouldn&#39;t matter - since doing the GUP on the page would<br />
make sure it&#39;s writable.<br />
<br />
The cause of that permission change would be good to figure out too,<br />
since it clearly results in spurious COW events - but fixing the<br />
nonsensical test that just happened to work before is obviously the<br />
CorrectThing(tm) to do regardless.