CVE-2024-38306

Severity CVSS v4.0:
Pending analysis
Type:
CWE-362 Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition')
Publication date:
25/06/2024
Last modified:
17/09/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> btrfs: protect folio::private when attaching extent buffer folios<br /> <br /> [BUG]<br /> Since v6.8 there are rare kernel crashes reported by various people,<br /> the common factor is bad page status error messages like this:<br /> <br /> BUG: Bad page state in process kswapd0 pfn:d6e840<br /> page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c<br /> pfn:0xd6e840<br /> aops:btree_aops ino:1<br /> flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)<br /> page_type: 0xffffffff()<br /> raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0<br /> raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000<br /> page dumped because: non-NULL mapping<br /> <br /> [CAUSE]<br /> Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to<br /> allocate-then-attach method") changes the sequence when allocating a new<br /> extent buffer.<br /> <br /> Previously we always called grab_extent_buffer() under<br /> mapping-&gt;i_private_lock, to ensure the safety on modification on<br /> folio::private (which is a pointer to extent buffer for regular<br /> sectorsize).<br /> <br /> This can lead to the following race:<br /> <br /> Thread A is trying to allocate an extent buffer at bytenr X, with 4<br /> 4K pages, meanwhile thread B is trying to release the page at X + 4K<br /> (the second page of the extent buffer at X).<br /> <br /> Thread A | Thread B<br /> -----------------------------------+-------------------------------------<br /> | btree_release_folio()<br /> | | This is for the page at X + 4K,<br /> | | Not page X.<br /> | |<br /> alloc_extent_buffer() | |- release_extent_buffer()<br /> |- filemap_add_folio() for the | | |- atomic_dec_and_test(eb-&gt;refs)<br /> | page at bytenr X (the first | | |<br /> | page). | | |<br /> | Which returned -EEXIST. | | |<br /> | | | |<br /> |- filemap_lock_folio() | | |<br /> | Returned the first page locked. | | |<br /> | | | |<br /> |- grab_extent_buffer() | | |<br /> | |- atomic_inc_not_zero() | | |<br /> | | Returned false | | |<br /> | |- folio_detach_private() | | |- folio_detach_private() for X<br /> | |- folio_test_private() | | |- folio_test_private()<br /> | Returned true | | | Returned true<br /> |- folio_put() | |- folio_put()<br /> <br /> Now there are two puts on the same folio at folio X, leading to refcount<br /> underflow of the folio X, and eventually causing the BUG_ON() on the<br /> page-&gt;mapping.<br /> <br /> The condition is not that easy to hit:<br /> <br /> - The release must be triggered for the middle page of an eb<br /> If the release is on the same first page of an eb, page lock would kick<br /> in and prevent the race.<br /> <br /> - folio_detach_private() has a very small race window<br /> It&amp;#39;s only between folio_test_private() and folio_clear_private().<br /> <br /> That&amp;#39;s exactly when mapping-&gt;i_private_lock is used to prevent such race,<br /> and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to<br /> allocate-then-attach method") screwed that up.<br /> <br /> At that time, I thought the page lock would kick in as<br /> filemap_release_folio() also requires the page to be locked, but forgot<br /> the filemap_release_folio() only locks one page, not all pages of an<br /> extent buffer.<br /> <br /> [FIX]<br /> Move all the code requiring i_private_lock into<br /> attach_eb_folio_to_filemap(), so that everything is done with proper<br /> lock protection.<br /> <br /> Furthermore to prevent future problems, add an extra<br /> lockdep_assert_locked() to ensure we&amp;#39;re holding the proper lock.<br /> <br /> To reproducer that is able to hit the race (takes a few minutes with<br /> instrumented code inserting delays to alloc_extent_buffer()):<br /> <br /> #!/bin/sh<br /> drop_caches () {<br /> while(true); do<br /> echo 3 &gt; /proc/sys/vm/drop_caches<br /> echo 1 &gt; /proc/sys/vm/compact_memory<br /> done<br /> }<br /> <br /> run_tar () {<br /> while(true); do<br /> for x in `seq 1 80` ; do<br /> tar cf /dev/zero /mnt &gt; /dev/null &amp;<br /> done<br /> wait<br /> done<br /> }<br /> <br /> mkfs.btrfs -f -d single -m single<br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.8 (including) 6.9.5 (excluding)
cpe:2.3:o:linux:linux_kernel:6.10:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.10:rc2:*:*:*:*:*:*