CVE-2025-21880
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
27/03/2025
Last modified:
27/03/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
drm/xe/userptr: fix EFAULT handling<br />
<br />
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error<br />
when called from xe_vm_userptr_pin() with the idea that we want to avoid<br />
killing the entire vm and chucking an error, under the assumption that<br />
the user just did an unmap or something, and has no intention of<br />
actually touching that memory from the GPU. At this point we have<br />
already zapped the PTEs so any access should generate a page fault, and<br />
if the pin fails there also it will then become fatal.<br />
<br />
However it looks like it&#39;s possible for the userptr vma to still be on<br />
the rebind list in preempt_rebind_work_func(), if we had to retry the<br />
pin again due to something happening in the caller before we did the<br />
rebind step, but in the meantime needing to re-validate the userptr and<br />
this time hitting the EFAULT.<br />
<br />
This explains an internal user report of hitting:<br />
<br />
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]<br />
[ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]<br />
[ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]<br />
[ 191.738690] Call Trace:<br />
[ 191.738692] <br />
[ 191.738694] ? show_regs+0x69/0x80<br />
[ 191.738698] ? __warn+0x93/0x1a0<br />
[ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]<br />
[ 191.738759] ? report_bug+0x18f/0x1a0<br />
[ 191.738764] ? handle_bug+0x63/0xa0<br />
[ 191.738767] ? exc_invalid_op+0x19/0x70<br />
[ 191.738770] ? asm_exc_invalid_op+0x1b/0x20<br />
[ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]<br />
[ 191.738834] ? ret_from_fork_asm+0x1a/0x30<br />
[ 191.738849] bind_op_prepare+0x105/0x7b0 [xe]<br />
[ 191.738906] ? dma_resv_reserve_fences+0x301/0x380<br />
[ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]<br />
[ 191.738966] ? kmemleak_alloc+0x4b/0x80<br />
[ 191.738973] ops_execute+0x188/0x9d0 [xe]<br />
[ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe]<br />
[ 191.739098] ? trace_hardirqs_on+0x4d/0x60<br />
[ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]<br />
<br />
Followed by NPD, when running some workload, since the sg was never<br />
actually populated but the vma is still marked for rebind when it should<br />
be skipped for this special EFAULT case. This is confirmed to fix the<br />
user report.<br />
<br />
v2 (MattB):<br />
- Move earlier.<br />
v3 (MattB):<br />
- Update the commit message to make it clear that this indeed fixes the<br />
issue.<br />
<br />
(cherry picked from commit 6b93cb98910c826c2e2004942f8b060311e43618)