CVE-2025-21892
Publication date:
27/03/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
RDMA/mlx5: Fix the recovery flow of the UMR QP<br />
<br />
This patch addresses an issue in the recovery flow of the UMR QP,<br />
ensuring tasks do not get stuck, as highlighted by the call trace [1].<br />
<br />
During recovery, before transitioning the QP to the RESET state, the<br />
software must wait for all outstanding WRs to complete.<br />
<br />
Failing to do so can cause the firmware to skip sending some flushed<br />
CQEs with errors and simply discard them upon the RESET, as per the IB<br />
specification.<br />
<br />
This race condition can result in lost CQEs and tasks becoming stuck.<br />
<br />
To resolve this, the patch sends a final WR which serves only as a<br />
barrier before moving the QP state to RESET.<br />
<br />
Once a CQE is received for that final WR, it guarantees that no<br />
outstanding WRs remain, making it safe to transition the QP to RESET and<br />
subsequently back to RTS, restoring proper functionality.<br />
<br />
Note:<br />
For the barrier WR, we simply reuse the failed and ready WR.<br />
Since the QP is in an error state, it will only receive<br />
IB_WC_WR_FLUSH_ERR. However, as it serves only as a barrier we don&#39;t<br />
care about its status.<br />
<br />
[1]<br />
INFO: task rdma_resource_l:1922 blocked for more than 120 seconds.<br />
Tainted: G W 6.12.0-rc7+ #1626<br />
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />
task:rdma_resource_l state:D stack:0 pid:1922 tgid:1922 ppid:1369<br />
flags:0x00004004<br />
Call Trace:<br />
<br />
__schedule+0x420/0xd30<br />
schedule+0x47/0x130<br />
schedule_timeout+0x280/0x300<br />
? mark_held_locks+0x48/0x80<br />
? lockdep_hardirqs_on_prepare+0xe5/0x1a0<br />
wait_for_completion+0x75/0x130<br />
mlx5r_umr_post_send_wait+0x3c2/0x5b0 [mlx5_ib]<br />
? __pfx_mlx5r_umr_done+0x10/0x10 [mlx5_ib]<br />
mlx5r_umr_revoke_mr+0x93/0xc0 [mlx5_ib]<br />
__mlx5_ib_dereg_mr+0x299/0x520 [mlx5_ib]<br />
? _raw_spin_unlock_irq+0x24/0x40<br />
? wait_for_completion+0xfe/0x130<br />
? rdma_restrack_put+0x63/0xe0 [ib_core]<br />
ib_dereg_mr_user+0x5f/0x120 [ib_core]<br />
? lock_release+0xc6/0x280<br />
destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs]<br />
uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs]<br />
uobj_destroy+0x3f/0x70 [ib_uverbs]<br />
ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs]<br />
? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs]<br />
? __lock_acquire+0x64e/0x2080<br />
? mark_held_locks+0x48/0x80<br />
? find_held_lock+0x2d/0xa0<br />
? lock_acquire+0xc1/0x2f0<br />
? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs]<br />
? __fget_files+0xc3/0x1b0<br />
ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs]<br />
? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs]<br />
__x64_sys_ioctl+0x1b0/0xa70<br />
do_syscall_64+0x6b/0x140<br />
entry_SYSCALL_64_after_hwframe+0x76/0x7e<br />
RIP: 0033:0x7f99c918b17b<br />
RSP: 002b:00007ffc766d0468 EFLAGS: 00000246 ORIG_RAX:<br />
0000000000000010<br />
RAX: ffffffffffffffda RBX: 00007ffc766d0578 RCX:<br />
00007f99c918b17b<br />
RDX: 00007ffc766d0560 RSI: 00000000c0181b01 RDI:<br />
0000000000000003<br />
RBP: 00007ffc766d0540 R08: 00007f99c8f99010 R09:<br />
000000000000bd7e<br />
R10: 00007f99c94c1c70 R11: 0000000000000246 R12:<br />
00007ffc766d0530<br />
R13: 000000000000001c R14: 0000000040246a80 R15:<br />
0000000000000000<br />
Severity CVSS v4.0: Pending analysis
Last modification:
29/10/2025