CVE-2025-40303
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
08/12/2025
Last modified:
08/12/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: ensure no dirty metadata is written back for an fs with errors<br />
<br />
[BUG]<br />
During development of a minor feature (make sure all btrfs_bio::end_io()<br />
is called in task context), I noticed a crash in generic/388, where<br />
metadata writes triggered new works after btrfs_stop_all_workers().<br />
<br />
It turns out that it can even happen without any code modification, just<br />
using RAID5 for metadata and the same workload from generic/388 is going<br />
to trigger the use-after-free.<br />
<br />
[CAUSE]<br />
If btrfs hits an error, the fs is marked as error, no new<br />
transaction is allowed thus metadata is in a frozen state.<br />
<br />
But there are some metadata modifications before that error, and they are<br />
still in the btree inode page cache.<br />
<br />
Since there will be no real transaction commit, all those dirty folios<br />
are just kept as is in the page cache, and they can not be invalidated<br />
by invalidate_inode_pages2() call inside close_ctree(), because they are<br />
dirty.<br />
<br />
And finally after btrfs_stop_all_workers(), we call iput() on btree<br />
inode, which triggers writeback of those dirty metadata.<br />
<br />
And if the fs is using RAID56 metadata, this will trigger RMW and queue<br />
new works into rmw_workers, which is already stopped, causing warning<br />
from queue_work() and use-after-free.<br />
<br />
[FIX]<br />
Add a special handling for write_one_eb(), that if the fs is already in<br />
an error state, immediately mark the bbio as failure, instead of really<br />
submitting them.<br />
<br />
Then during close_ctree(), iput() will just discard all those dirty<br />
tree blocks without really writing them back, thus no more new jobs for<br />
already stopped-and-freed workqueues.<br />
<br />
The extra discard in write_one_eb() also acts as an extra safenet.<br />
E.g. the transaction abort is triggered by some extent/free space<br />
tree corruptions, and since extent/free space tree is already corrupted<br />
some tree blocks may be allocated where they shouldn&#39;t be (overwriting<br />
existing tree blocks). In that case writing them back will further<br />
corrupting the fs.



