CVE-2024-53169
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
27/12/2024
Last modified:
01/10/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
nvme-fabrics: fix kernel crash while shutting down controller<br />
<br />
The nvme keep-alive operation, which executes at a periodic interval,<br />
could potentially sneak in while shutting down a fabric controller.<br />
This may lead to a race between the fabric controller admin queue<br />
destroy code path (invoked while shutting down controller) and hw/hctx<br />
queue dispatcher called from the nvme keep-alive async request queuing<br />
operation. This race could lead to the kernel crash shown below:<br />
<br />
Call Trace:<br />
autoremove_wake_function+0x0/0xbc (unreliable)<br />
__blk_mq_sched_dispatch_requests+0x114/0x24c<br />
blk_mq_sched_dispatch_requests+0x44/0x84<br />
blk_mq_run_hw_queue+0x140/0x220<br />
nvme_keep_alive_work+0xc8/0x19c [nvme_core]<br />
process_one_work+0x200/0x4e0<br />
worker_thread+0x340/0x504<br />
kthread+0x138/0x140<br />
start_kernel_thread+0x14/0x18<br />
<br />
While shutting down fabric controller, if nvme keep-alive request sneaks<br />
in then it would be flushed off. The nvme_keep_alive_end_io function is<br />
then invoked to handle the end of the keep-alive operation which<br />
decrements the admin->q_usage_counter and assuming this is the last/only<br />
request in the admin queue then the admin->q_usage_counter becomes zero.<br />
If that happens then blk-mq destroy queue operation (blk_mq_destroy_<br />
queue()) which could be potentially running simultaneously on another<br />
cpu (as this is the controller shutdown code path) would forward<br />
progress and deletes the admin queue. So, now from this point onward<br />
we are not supposed to access the admin queue resources. However the<br />
issue here&#39;s that the nvme keep-alive thread running hw/hctx queue<br />
dispatch operation hasn&#39;t yet finished its work and so it could still<br />
potentially access the admin queue resource while the admin queue had<br />
been already deleted and that causes the above crash.<br />
<br />
The above kernel crash is regression caused due to changes implemented<br />
in commit a54a93d0e359 ("nvme: move stopping keep-alive into<br />
nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin<br />
g the admin queue and freeing the admin tagset so that it wouldn&#39;t sneak<br />
in during the shutdown operation. However we removed the keep alive stop<br />
operation from the beginning of the controller shutdown code path in commit<br />
a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()")<br />
and added it under nvme_uninit_ctrl() which executes very late in the<br />
shutdown code path after the admin queue is destroyed and its tagset is<br />
removed. So this change created the possibility of keep-alive sneaking in<br />
and interfering with the shutdown operation and causing observed kernel<br />
crash.<br />
<br />
To fix the observed crash, we decided to move nvme_stop_keep_alive() from<br />
nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure<br />
that we don&#39;t forward progress and delete the admin queue until the keep-<br />
alive operation is finished (if it&#39;s in-flight) or cancelled and that would<br />
help contain the race condition explained above and hence avoid the crash.<br />
<br />
Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of<br />
adding nvme_stop_keep_alive() to the beginning of the controller shutdown<br />
code path in nvme_stop_ctrl(), as was the case earlier before commit<br />
a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"),<br />
would help save one callsite of nvme_stop_keep_alive().
Impact
Base Score 3.x
4.70
Severity 3.x
MEDIUM
Vulnerable products and versions
| CPE | From | Up to |
|---|---|---|
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.10.7 (including) | 6.11 (excluding) |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.11.1 (including) | 6.11.11 (excluding) |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.12 (including) | 6.12.2 (excluding) |
| cpe:2.3:o:linux:linux_kernel:6.11:-:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.11:rc5:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.11:rc6:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.11:rc7:*:*:*:*:*:* |
To consult the complete list of CPE names with products and versions, see this page



