CVE-2025-40303

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
08/12/2025
Last modified:
08/12/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> btrfs: ensure no dirty metadata is written back for an fs with errors<br /> <br /> [BUG]<br /> During development of a minor feature (make sure all btrfs_bio::end_io()<br /> is called in task context), I noticed a crash in generic/388, where<br /> metadata writes triggered new works after btrfs_stop_all_workers().<br /> <br /> It turns out that it can even happen without any code modification, just<br /> using RAID5 for metadata and the same workload from generic/388 is going<br /> to trigger the use-after-free.<br /> <br /> [CAUSE]<br /> If btrfs hits an error, the fs is marked as error, no new<br /> transaction is allowed thus metadata is in a frozen state.<br /> <br /> But there are some metadata modifications before that error, and they are<br /> still in the btree inode page cache.<br /> <br /> Since there will be no real transaction commit, all those dirty folios<br /> are just kept as is in the page cache, and they can not be invalidated<br /> by invalidate_inode_pages2() call inside close_ctree(), because they are<br /> dirty.<br /> <br /> And finally after btrfs_stop_all_workers(), we call iput() on btree<br /> inode, which triggers writeback of those dirty metadata.<br /> <br /> And if the fs is using RAID56 metadata, this will trigger RMW and queue<br /> new works into rmw_workers, which is already stopped, causing warning<br /> from queue_work() and use-after-free.<br /> <br /> [FIX]<br /> Add a special handling for write_one_eb(), that if the fs is already in<br /> an error state, immediately mark the bbio as failure, instead of really<br /> submitting them.<br /> <br /> Then during close_ctree(), iput() will just discard all those dirty<br /> tree blocks without really writing them back, thus no more new jobs for<br /> already stopped-and-freed workqueues.<br /> <br /> The extra discard in write_one_eb() also acts as an extra safenet.<br /> E.g. the transaction abort is triggered by some extent/free space<br /> tree corruptions, and since extent/free space tree is already corrupted<br /> some tree blocks may be allocated where they shouldn&amp;#39;t be (overwriting<br /> existing tree blocks). In that case writing them back will further<br /> corrupting the fs.

Impact