CVE-2024-53169

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
27/12/2024
Last modified:
01/10/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> nvme-fabrics: fix kernel crash while shutting down controller<br /> <br /> The nvme keep-alive operation, which executes at a periodic interval,<br /> could potentially sneak in while shutting down a fabric controller.<br /> This may lead to a race between the fabric controller admin queue<br /> destroy code path (invoked while shutting down controller) and hw/hctx<br /> queue dispatcher called from the nvme keep-alive async request queuing<br /> operation. This race could lead to the kernel crash shown below:<br /> <br /> Call Trace:<br /> autoremove_wake_function+0x0/0xbc (unreliable)<br /> __blk_mq_sched_dispatch_requests+0x114/0x24c<br /> blk_mq_sched_dispatch_requests+0x44/0x84<br /> blk_mq_run_hw_queue+0x140/0x220<br /> nvme_keep_alive_work+0xc8/0x19c [nvme_core]<br /> process_one_work+0x200/0x4e0<br /> worker_thread+0x340/0x504<br /> kthread+0x138/0x140<br /> start_kernel_thread+0x14/0x18<br /> <br /> While shutting down fabric controller, if nvme keep-alive request sneaks<br /> in then it would be flushed off. The nvme_keep_alive_end_io function is<br /> then invoked to handle the end of the keep-alive operation which<br /> decrements the admin-&gt;q_usage_counter and assuming this is the last/only<br /> request in the admin queue then the admin-&gt;q_usage_counter becomes zero.<br /> If that happens then blk-mq destroy queue operation (blk_mq_destroy_<br /> queue()) which could be potentially running simultaneously on another<br /> cpu (as this is the controller shutdown code path) would forward<br /> progress and deletes the admin queue. So, now from this point onward<br /> we are not supposed to access the admin queue resources. However the<br /> issue here&amp;#39;s that the nvme keep-alive thread running hw/hctx queue<br /> dispatch operation hasn&amp;#39;t yet finished its work and so it could still<br /> potentially access the admin queue resource while the admin queue had<br /> been already deleted and that causes the above crash.<br /> <br /> The above kernel crash is regression caused due to changes implemented<br /> in commit a54a93d0e359 ("nvme: move stopping keep-alive into<br /> nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin<br /> g the admin queue and freeing the admin tagset so that it wouldn&amp;#39;t sneak<br /> in during the shutdown operation. However we removed the keep alive stop<br /> operation from the beginning of the controller shutdown code path in commit<br /> a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()")<br /> and added it under nvme_uninit_ctrl() which executes very late in the<br /> shutdown code path after the admin queue is destroyed and its tagset is<br /> removed. So this change created the possibility of keep-alive sneaking in<br /> and interfering with the shutdown operation and causing observed kernel<br /> crash.<br /> <br /> To fix the observed crash, we decided to move nvme_stop_keep_alive() from<br /> nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure<br /> that we don&amp;#39;t forward progress and delete the admin queue until the keep-<br /> alive operation is finished (if it&amp;#39;s in-flight) or cancelled and that would<br /> help contain the race condition explained above and hence avoid the crash.<br /> <br /> Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of<br /> adding nvme_stop_keep_alive() to the beginning of the controller shutdown<br /> code path in nvme_stop_ctrl(), as was the case earlier before commit<br /> a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"),<br /> would help save one callsite of nvme_stop_keep_alive().

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.10.7 (including) 6.11 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.11.1 (including) 6.11.11 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.12 (including) 6.12.2 (excluding)
cpe:2.3:o:linux:linux_kernel:6.11:-:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc5:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc6:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc7:*:*:*:*:*:*