CVE-2024-58057

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
06/03/2025
Last modified:
06/03/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> idpf: convert workqueues to unbound<br /> <br /> When a workqueue is created with `WQ_UNBOUND`, its work items are<br /> served by special worker-pools, whose host workers are not bound to<br /> any specific CPU. In the default configuration (i.e. when<br /> `queue_delayed_work` and friends do not specify which CPU to run the<br /> work item on), `WQ_UNBOUND` allows the work item to be executed on any<br /> CPU in the same node of the CPU it was enqueued on. While this<br /> solution potentially sacrifices locality, it avoids contention with<br /> other processes that might dominate the CPU time of the processor the<br /> work item was scheduled on.<br /> <br /> This is not just a theoretical problem: in a particular scenario<br /> misconfigured process was hogging most of the time from CPU0, leaving<br /> less than 0.5% of its CPU time to the kworker. The IDPF workqueues<br /> that were using the kworker on CPU0 suffered large completion delays<br /> as a result, causing performance degradation, timeouts and eventual<br /> system crash.<br /> <br /> <br /> * I have also run a manual test to gauge the performance<br /> improvement. The test consists of an antagonist process<br /> (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This<br /> process is run under `taskset 01` to bind it to CPU0, and its<br /> priority is changed with `chrt -pQ 9900 10000 ${pid}` and<br /> `renice -n -20 ${pid}` after start.<br /> <br /> Then, the IDPF driver is forced to prefer CPU0 by editing all calls<br /> to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.<br /> <br /> Finally, `ktraces` for the workqueue events are collected.<br /> <br /> Without the current patch, the antagonist process can force<br /> arbitrary delays between `workqueue_queue_work` and<br /> `workqueue_execute_start`, that in my tests were as high as<br /> `30ms`. With the current patch applied, the workqueue can be<br /> migrated to another unloaded CPU in the same node, and, keeping<br /> everything else equal, the maximum delay I could see was `6us`.

Impact