CVE-2022-49547
Publication date:
26/02/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: fix deadlock between concurrent dio writes when low on free data space<br />
<br />
When reserving data space for a direct IO write we can end up deadlocking<br />
if we have multiple tasks attempting a write to the same file range, there<br />
are multiple extents covered by that file range, we are low on available<br />
space for data and the writes don&#39;t expand the inode&#39;s i_size.<br />
<br />
The deadlock can happen like this:<br />
<br />
1) We have a file with an i_size of 1M, at offset 0 it has an extent with<br />
a size of 128K and at offset 128K it has another extent also with a<br />
size of 128K;<br />
<br />
2) Task A does a direct IO write against file range [0, 256K), and because<br />
the write is within the i_size boundary, it takes the inode&#39;s lock (VFS<br />
level) in shared mode;<br />
<br />
3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and<br />
then gets the extent map for the extent covering the range [0, 128K).<br />
At btrfs_get_blocks_direct_write(), it creates an ordered extent for<br />
that file range ([0, 128K));<br />
<br />
4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file<br />
range [0, 256K);<br />
<br />
5) Task A executes btrfs_dio_iomap_begin() again, this time for the file<br />
range [128K, 256K), and locks the file range [128K, 256K);<br />
<br />
6) Task B starts a direct IO write against file range [0, 256K) as well.<br />
It also locks the inode in shared mode, as it&#39;s within the i_size limit,<br />
and then tries to lock file range [0, 256K). It is able to lock the<br />
subrange [0, 128K) but then blocks waiting for the range [128K, 256K),<br />
as it is currently locked by task A;<br />
<br />
7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data<br />
space. Because we are low on available free space, it triggers the<br />
async data reclaim task, and waits for it to reserve data space;<br />
<br />
8) The async reclaim task decides to wait for all existing ordered extents<br />
to complete (through btrfs_wait_ordered_roots()).<br />
It finds the ordered extent previously created by task A for the file<br />
range [0, 128K) and waits for it to complete;<br />
<br />
9) The ordered extent for the file range [0, 128K) can not complete<br />
because it blocks at btrfs_finish_ordered_io() when trying to lock the<br />
file range [0, 128K).<br />
<br />
This results in a deadlock, because:<br />
<br />
- task B is holding the file range [0, 128K) locked, waiting for the<br />
range [128K, 256K) to be unlocked by task A;<br />
<br />
- task A is holding the file range [128K, 256K) locked and it&#39;s waiting<br />
for the async data reclaim task to satisfy its space reservation<br />
request;<br />
<br />
- the async data reclaim task is waiting for ordered extent [0, 128K)<br />
to complete, but the ordered extent can not complete because the<br />
file range [0, 128K) is currently locked by task B, which is waiting<br />
on task A to unlock file range [128K, 256K) and task A waiting<br />
on the async data reclaim task.<br />
<br />
This results in a deadlock between 4 task: task A, task B, the async<br />
data reclaim task and the task doing ordered extent completion (a work<br />
queue task).<br />
<br />
This type of deadlock can sporadically be triggered by the test case<br />
generic/300 from fstests, and results in a stack trace like the following:<br />
<br />
[12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds.<br />
[12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1<br />
[12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />
[12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000<br />
[12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]<br />
[12084.036599] Call Trace:<br />
[12084.036601] <br />
[12084.036606] __schedule+0x3cb/0xed0<br />
[12084.036616] schedule+0x4e/0xb0<br />
[12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs]<br />
[12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0<br />
[12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]<br />
[12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs]<br />
[12084.0367<br />
---truncated---
Severity CVSS v4.0: Pending analysis
Last modification:
01/10/2025