CVE-2024-40927

Severity CVSS v4.0:
Pending analysis
Type:
CWE-416 Use After Free
Publication date:
12/07/2024
Last modified:
03/11/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> xhci: Handle TD clearing for multiple streams case<br /> <br /> When multiple streams are in use, multiple TDs might be in flight when<br /> an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for<br /> each, to ensure everything is reset properly and the caches cleared.<br /> Change the logic so that any N&gt;1 TDs found active for different streams<br /> are deferred until after the first one is processed, calling<br /> xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to<br /> queue another command until we are done with all of them. Also change<br /> the error/"should never happen" paths to ensure we at least clear any<br /> affected TDs, even if we can&amp;#39;t issue a command to clear the hardware<br /> cache, and complain loudly with an xhci_warn() if this ever happens.<br /> <br /> This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct<br /> assumptions about number of rings per endpoint.") early on in the XHCI<br /> driver&amp;#39;s life, when stream support was first added.<br /> It was then identified but not fixed nor made into a warning in commit<br /> 674f8438c121 ("xhci: split handling halted endpoints into two steps"),<br /> which added a FIXME comment for the problem case (without materially<br /> changing the behavior as far as I can tell, though the new logic made<br /> the problem more obvious).<br /> <br /> Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some<br /> cached cancelled URBs."), it was acknowledged again.<br /> <br /> [Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached<br /> cancelled URBs.") was a targeted regression fix to the previously mentioned<br /> patch. Users reported issues with usb stuck after unmounting/disconnecting<br /> UAS devices. This rolled back the TD clearing of multiple streams to its<br /> original state.]<br /> <br /> Apparently the commit author was aware of the problem (yet still chose<br /> to submit it): It was still mentioned as a FIXME, an xhci_dbg() was<br /> added to log the problem condition, and the remaining issue was mentioned<br /> in the commit description. The choice of making the log type xhci_dbg()<br /> for what is, at this point, a completely unhandled and known broken<br /> condition is puzzling and unfortunate, as it guarantees that no actual<br /> users would see the log in production, thereby making it nigh<br /> undebuggable (indeed, even if you turn on DEBUG, the message doesn&amp;#39;t<br /> really hint at there being a problem at all).<br /> <br /> It took me *months* of random xHC crashes to finally find a reliable<br /> repro and be able to do a deep dive debug session, which could all have<br /> been avoided had this unhandled, broken condition been actually reported<br /> with a warning, as it should have been as a bug intentionally left in<br /> unfixed (never mind that it shouldn&amp;#39;t have been left in at all).<br /> <br /> &gt; Another fix to solve clearing the caches of all stream rings with<br /> &gt; cancelled TDs is needed, but not as urgent.<br /> <br /> 3 years after that statement and 14 years after the original bug was<br /> introduced, I think it&amp;#39;s finally time to fix it. And maybe next time<br /> let&amp;#39;s not leave bugs unfixed (that are actually worse than the original<br /> bug), and let&amp;#39;s actually get people to review kernel commits please.<br /> <br /> Fixes xHC crashes and IOMMU faults with UAS devices when handling<br /> errors/faults. Easiest repro is to use `hdparm` to mark an early sector<br /> (e.g. 1024) on a disk as bad, then `cat /dev/sdX &gt; /dev/null` in a loop.<br /> At least in the case of JMicron controllers, the read errors end up<br /> having to cancel two TDs (for two queued requests to different streams)<br /> and the one that didn&amp;#39;t get cleared properly ends up faulting the xHC<br /> entirely when it tries to access DMA pages that have since been unmapped,<br /> referred to by the stale TDs. This normally happens quickly (after two<br /> or three loops). After this fix, I left the `cat` in a loop running<br /> overnight and experienced no xHC failures, with all read errors<br /> recovered properly. Repro&amp;#39;d and tested on an Apple M1 Mac Mini<br /> (dwc3 host).<br /> <br /> On systems without an IOMMU, this bug would instead silently corrupt<br /> freed memory, making this a<br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 2.6.35 (including) 5.15.162 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.16 (including) 6.1.95 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.2 (including) 6.6.35 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.7 (including) 6.9.6 (excluding)
cpe:2.3:o:linux:linux_kernel:6.10:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.10:rc2:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.10:rc3:*:*:*:*:*:*