CVE-2024-50022
Publication date:
21/10/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
device-dax: correct pgoff align in dax_set_mapping()<br />
<br />
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,<br />
vmf->address not aligned to fault_size will be aligned to the next<br />
alignment, that can result in memory failure getting the wrong address.<br />
<br />
It&#39;s a subtle situation that only can be observed in<br />
page_mapped_in_vma() after the page is page fault handled by<br />
dev_dax_huge_fault. Generally, there is little chance to perform<br />
page_mapped_in_vma in dev-dax&#39;s page unless in specific error injection<br />
to the dax device to trigger an MCE - memory-failure. In that case,<br />
page_mapped_in_vma() will be triggered to determine which task is<br />
accessing the failure address and kill that task in the end.<br />
<br />
<br />
We used self-developed dax device (which is 2M aligned mapping) , to<br />
perform error injection to random address. It turned out that error<br />
injected to non-2M-aligned address was causing endless MCE until panic.<br />
Because page_mapped_in_vma() kept resulting wrong address and the task<br />
accessing the failure address was never killed properly:<br />
<br />
<br />
[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3784.049006] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3784.448042] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3784.792026] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3785.162502] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3785.461116] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3785.764730] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3786.042128] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3786.464293] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3786.818090] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
[ 3787.085297] mce: Uncorrected hardware memory error in user-access at <br />
200c9742380<br />
[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: <br />
Recovered<br />
<br />
It took us several weeks to pinpoint this problem, but we eventually<br />
used bpftrace to trace the page fault and mce address and successfully<br />
identified the issue.<br />
<br />
<br />
Joao added:<br />
<br />
; Likely we never reproduce in production because we always pin<br />
: device-dax regions in the region align they provide (Qemu does<br />
: similarly with prealloc in hugetlb/file backed memory). I think this<br />
: bug requires that we touch *unpinned* device-dax regions unaligned to<br />
: the device-dax selected alignment (page size i.e. 4K/2M/1G)
Severity CVSS v4.0: Pending analysis
Last modification:
03/11/2025