CVE-2024-38306
Publication date:
25/06/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: protect folio::private when attaching extent buffer folios<br />
<br />
[BUG]<br />
Since v6.8 there are rare kernel crashes reported by various people,<br />
the common factor is bad page status error messages like this:<br />
<br />
BUG: Bad page state in process kswapd0 pfn:d6e840<br />
page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c<br />
pfn:0xd6e840<br />
aops:btree_aops ino:1<br />
flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)<br />
page_type: 0xffffffff()<br />
raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0<br />
raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000<br />
page dumped because: non-NULL mapping<br />
<br />
[CAUSE]<br />
Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to<br />
allocate-then-attach method") changes the sequence when allocating a new<br />
extent buffer.<br />
<br />
Previously we always called grab_extent_buffer() under<br />
mapping->i_private_lock, to ensure the safety on modification on<br />
folio::private (which is a pointer to extent buffer for regular<br />
sectorsize).<br />
<br />
This can lead to the following race:<br />
<br />
Thread A is trying to allocate an extent buffer at bytenr X, with 4<br />
4K pages, meanwhile thread B is trying to release the page at X + 4K<br />
(the second page of the extent buffer at X).<br />
<br />
Thread A | Thread B<br />
-----------------------------------+-------------------------------------<br />
| btree_release_folio()<br />
| | This is for the page at X + 4K,<br />
| | Not page X.<br />
| |<br />
alloc_extent_buffer() | |- release_extent_buffer()<br />
|- filemap_add_folio() for the | | |- atomic_dec_and_test(eb->refs)<br />
| page at bytenr X (the first | | |<br />
| page). | | |<br />
| Which returned -EEXIST. | | |<br />
| | | |<br />
|- filemap_lock_folio() | | |<br />
| Returned the first page locked. | | |<br />
| | | |<br />
|- grab_extent_buffer() | | |<br />
| |- atomic_inc_not_zero() | | |<br />
| | Returned false | | |<br />
| |- folio_detach_private() | | |- folio_detach_private() for X<br />
| |- folio_test_private() | | |- folio_test_private()<br />
| Returned true | | | Returned true<br />
|- folio_put() | |- folio_put()<br />
<br />
Now there are two puts on the same folio at folio X, leading to refcount<br />
underflow of the folio X, and eventually causing the BUG_ON() on the<br />
page->mapping.<br />
<br />
The condition is not that easy to hit:<br />
<br />
- The release must be triggered for the middle page of an eb<br />
If the release is on the same first page of an eb, page lock would kick<br />
in and prevent the race.<br />
<br />
- folio_detach_private() has a very small race window<br />
It&#39;s only between folio_test_private() and folio_clear_private().<br />
<br />
That&#39;s exactly when mapping->i_private_lock is used to prevent such race,<br />
and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to<br />
allocate-then-attach method") screwed that up.<br />
<br />
At that time, I thought the page lock would kick in as<br />
filemap_release_folio() also requires the page to be locked, but forgot<br />
the filemap_release_folio() only locks one page, not all pages of an<br />
extent buffer.<br />
<br />
[FIX]<br />
Move all the code requiring i_private_lock into<br />
attach_eb_folio_to_filemap(), so that everything is done with proper<br />
lock protection.<br />
<br />
Furthermore to prevent future problems, add an extra<br />
lockdep_assert_locked() to ensure we&#39;re holding the proper lock.<br />
<br />
To reproducer that is able to hit the race (takes a few minutes with<br />
instrumented code inserting delays to alloc_extent_buffer()):<br />
<br />
#!/bin/sh<br />
drop_caches () {<br />
while(true); do<br />
echo 3 > /proc/sys/vm/drop_caches<br />
echo 1 > /proc/sys/vm/compact_memory<br />
done<br />
}<br />
<br />
run_tar () {<br />
while(true); do<br />
for x in `seq 1 80` ; do<br />
tar cf /dev/zero /mnt > /dev/null &<br />
done<br />
wait<br />
done<br />
}<br />
<br />
mkfs.btrfs -f -d single -m single<br />
---truncated---
Severity CVSS v4.0: Pending analysis
Last modification:
17/09/2025