CVE-2025-38358

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
25/07/2025
Last modified:
25/07/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> btrfs: fix race between async reclaim worker and close_ctree()<br /> <br /> Syzbot reported an assertion failure due to an attempt to add a delayed<br /> iput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info<br /> state:<br /> <br /> WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420<br /> Modules linked in:<br /> CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full)<br /> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025<br /> Workqueue: btrfs-endio-write btrfs_work_helper<br /> RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420<br /> Code: 4e ad 5d (...)<br /> RSP: 0018:ffffc9000213f780 EFLAGS: 00010293<br /> RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00<br /> RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000<br /> RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c<br /> R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001<br /> R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100<br /> FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000<br /> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033<br /> CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0<br /> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br /> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400<br /> Call Trace:<br /> <br /> btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635<br /> btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312<br /> btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312<br /> process_one_work kernel/workqueue.c:3238 [inline]<br /> process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321<br /> worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402<br /> kthread+0x70e/0x8a0 kernel/kthread.c:464<br /> ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148<br /> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245<br /> <br /> <br /> This can happen due to a race with the async reclaim worker like this:<br /> <br /> 1) The async metadata reclaim worker enters shrink_delalloc(), which calls<br /> btrfs_start_delalloc_roots() with an nr_pages argument that has a value<br /> less than LONG_MAX, and that in turn enters start_delalloc_inodes(),<br /> which sets the local variable &amp;#39;full_flush&amp;#39; to false because<br /> wbc-&gt;nr_to_write is less than LONG_MAX;<br /> <br /> 2) There it finds inode X in a root&amp;#39;s delalloc list, grabs a reference for<br /> inode X (with igrab()), and triggers writeback for it with<br /> filemap_fdatawrite_wbc(), which creates an ordered extent for inode X;<br /> <br /> 3) The unmount sequence starts from another task, we enter close_ctree()<br /> and we flush the workqueue fs_info-&gt;endio_write_workers, which waits<br /> for the ordered extent for inode X to complete and when dropping the<br /> last reference of the ordered extent, with btrfs_put_ordered_extent(),<br /> when we call btrfs_add_delayed_iput() we don&amp;#39;t add the inode to the<br /> list of delayed iputs because it has a refcount of 2, so we decrement<br /> it to 1 and return;<br /> <br /> 4) Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which<br /> runs all delayed iputs, and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT<br /> in the fs_info state;<br /> <br /> 5) The async reclaim worker, after calling filemap_fdatawrite_wbc(), now<br /> calls btrfs_add_delayed_iput() for inode X and there we trigger an<br /> assertion failure since the fs_info state has the flag<br /> BTRFS_FS_STATE_NO_DELAYED_IPUT set.<br /> <br /> Fix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for<br /> the async reclaim workers to finish, after we call cancel_work_sync() for<br /> them at close_ctree(), and by running delayed iputs after wait for the<br /> reclaim workers to finish and before setting the bit.<br /> <br /> This race was recently introduced by commit 19e60b2a95f5 ("btrfs: add<br /> extra warning if delayed iput is added when it&amp;#39;s not allowed"). Without<br /> the new validation at btrfs_add_delayed_iput(), <br /> ---truncated---

Impact