CVE-2025-38358
Publication date:
25/07/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: fix race between async reclaim worker and close_ctree()<br />
<br />
Syzbot reported an assertion failure due to an attempt to add a delayed<br />
iput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info<br />
state:<br />
<br />
WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420<br />
Modules linked in:<br />
CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full)<br />
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025<br />
Workqueue: btrfs-endio-write btrfs_work_helper<br />
RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420<br />
Code: 4e ad 5d (...)<br />
RSP: 0018:ffffc9000213f780 EFLAGS: 00010293<br />
RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00<br />
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000<br />
RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c<br />
R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001<br />
R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100<br />
FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000<br />
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033<br />
CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0<br />
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br />
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400<br />
Call Trace:<br />
<br />
btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635<br />
btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312<br />
btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312<br />
process_one_work kernel/workqueue.c:3238 [inline]<br />
process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321<br />
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402<br />
kthread+0x70e/0x8a0 kernel/kthread.c:464<br />
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148<br />
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245<br />
<br />
<br />
This can happen due to a race with the async reclaim worker like this:<br />
<br />
1) The async metadata reclaim worker enters shrink_delalloc(), which calls<br />
btrfs_start_delalloc_roots() with an nr_pages argument that has a value<br />
less than LONG_MAX, and that in turn enters start_delalloc_inodes(),<br />
which sets the local variable &#39;full_flush&#39; to false because<br />
wbc->nr_to_write is less than LONG_MAX;<br />
<br />
2) There it finds inode X in a root&#39;s delalloc list, grabs a reference for<br />
inode X (with igrab()), and triggers writeback for it with<br />
filemap_fdatawrite_wbc(), which creates an ordered extent for inode X;<br />
<br />
3) The unmount sequence starts from another task, we enter close_ctree()<br />
and we flush the workqueue fs_info->endio_write_workers, which waits<br />
for the ordered extent for inode X to complete and when dropping the<br />
last reference of the ordered extent, with btrfs_put_ordered_extent(),<br />
when we call btrfs_add_delayed_iput() we don&#39;t add the inode to the<br />
list of delayed iputs because it has a refcount of 2, so we decrement<br />
it to 1 and return;<br />
<br />
4) Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which<br />
runs all delayed iputs, and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT<br />
in the fs_info state;<br />
<br />
5) The async reclaim worker, after calling filemap_fdatawrite_wbc(), now<br />
calls btrfs_add_delayed_iput() for inode X and there we trigger an<br />
assertion failure since the fs_info state has the flag<br />
BTRFS_FS_STATE_NO_DELAYED_IPUT set.<br />
<br />
Fix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for<br />
the async reclaim workers to finish, after we call cancel_work_sync() for<br />
them at close_ctree(), and by running delayed iputs after wait for the<br />
reclaim workers to finish and before setting the bit.<br />
<br />
This race was recently introduced by commit 19e60b2a95f5 ("btrfs: add<br />
extra warning if delayed iput is added when it&#39;s not allowed"). Without<br />
the new validation at btrfs_add_delayed_iput(), <br />
---truncated---
Severity CVSS v4.0: Pending analysis
Last modification:
25/07/2025