CVE-2024-57976
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
27/02/2025
Last modified:
06/07/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: do proper folio cleanup when cow_file_range() failed<br />
<br />
[BUG]<br />
When testing with COW fixup marked as BUG_ON() (this is involved with the<br />
new pin_user_pages*() change, which should not result new out-of-band<br />
dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW<br />
fixup path.<br />
<br />
This BUG_ON() happens just after a failed btrfs_run_delalloc_range():<br />
<br />
BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28<br />
------------[ cut here ]------------<br />
kernel BUG at fs/btrfs/extent_io.c:1444!<br />
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP<br />
CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86<br />
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022<br />
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]<br />
pc : extent_writepage_io+0x2d4/0x308 [btrfs]<br />
lr : extent_writepage_io+0x2d4/0x308 [btrfs]<br />
Call trace:<br />
extent_writepage_io+0x2d4/0x308 [btrfs]<br />
extent_writepage+0x218/0x330 [btrfs]<br />
extent_write_cache_pages+0x1d4/0x4b0 [btrfs]<br />
btrfs_writepages+0x94/0x150 [btrfs]<br />
do_writepages+0x74/0x190<br />
filemap_fdatawrite_wbc+0x88/0xc8<br />
start_delalloc_inodes+0x180/0x3b0 [btrfs]<br />
btrfs_start_delalloc_roots+0x174/0x280 [btrfs]<br />
shrink_delalloc+0x114/0x280 [btrfs]<br />
flush_space+0x250/0x2f8 [btrfs]<br />
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]<br />
process_one_work+0x164/0x408<br />
worker_thread+0x25c/0x388<br />
kthread+0x100/0x118<br />
ret_from_fork+0x10/0x20<br />
Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000)<br />
---[ end trace 0000000000000000 ]---<br />
<br />
[CAUSE]<br />
That failure is mostly from cow_file_range(), where we can hit -ENOSPC.<br />
<br />
Although the -ENOSPC is already a bug related to our space reservation<br />
code, let&#39;s just focus on the error handling.<br />
<br />
For example, we have the following dirty range [0, 64K) of an inode,<br />
with 4K sector size and 4K page size:<br />
<br />
0 16K 32K 48K 64K<br />
|///////////////////////////////////////|<br />
|#######################################|<br />
<br />
Where |///| means page are still dirty, and |###| means the extent io<br />
tree has EXTENT_DELALLOC flag.<br />
<br />
- Enter extent_writepage() for page 0<br />
<br />
- Enter btrfs_run_delalloc_range() for range [0, 64K)<br />
<br />
- Enter cow_file_range() for range [0, 64K)<br />
<br />
- Function btrfs_reserve_extent() only reserved one 16K extent<br />
So we created extent map and ordered extent for range [0, 16K)<br />
<br />
0 16K 32K 48K 64K<br />
|////////|//////////////////////////////|<br />
||##############################|<br />
<br />
And range [0, 16K) has its delalloc flag cleared.<br />
But since we haven&#39;t yet submit any bio, involved 4 pages are still<br />
dirty.<br />
<br />
- Function btrfs_reserve_extent() returns with -ENOSPC<br />
Now we have to run error cleanup, which will clear all<br />
EXTENT_DELALLOC* flags and clear the dirty flags for the remaining<br />
ranges:<br />
<br />
0 16K 32K 48K 64K<br />
|////////| |<br />
| | |<br />
<br />
Note that range [0, 16K) still has its pages dirty.<br />
<br />
- Some time later, writeback is triggered again for the range [0, 16K)<br />
since the page range still has dirty flags.<br />
<br />
- btrfs_run_delalloc_range() will do nothing because there is no<br />
EXTENT_DELALLOC flag.<br />
<br />
- extent_writepage_io() finds page 0 has no ordered flag<br />
Which falls into the COW fixup path, triggering the BUG_ON().<br />
<br />
Unfortunately this error handling bug dates back to the introduction of<br />
btrfs. Thankfully with the abuse of COW fixup, at least it won&#39;t crash<br />
the kernel.<br />
<br />
[FIX]<br />
Instead of immediately unlocking the extent and folios, we keep the extent<br />
and folios locked until either erroring out or the whole delalloc range<br />
finished.<br />
<br />
When the whole delalloc range finished without error, we just unlock the<br />
whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked<br />
cases)<br />
---truncated---