CVE-2025-68358
Publication date:
24/12/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: fix racy bitfield write in btrfs_clear_space_info_full()<br />
<br />
From the memory-barriers.txt document regarding memory barrier ordering<br />
guarantees:<br />
<br />
(*) These guarantees do not apply to bitfields, because compilers often<br />
generate code to modify these using non-atomic read-modify-write<br />
sequences. Do not attempt to use bitfields to synchronize parallel<br />
algorithms.<br />
<br />
(*) Even in cases where bitfields are protected by locks, all fields<br />
in a given bitfield must be protected by one lock. If two fields<br />
in a given bitfield are protected by different locks, the compiler&#39;s<br />
non-atomic read-modify-write sequences can cause an update to one<br />
field to corrupt the value of an adjacent field.<br />
<br />
btrfs_space_info has a bitfield sharing an underlying word consisting of<br />
the fields full, chunk_alloc, and flush:<br />
<br />
struct btrfs_space_info {<br />
struct btrfs_fs_info * fs_info; /* 0 8 */<br />
struct btrfs_space_info * parent; /* 8 8 */<br />
...<br />
int clamp; /* 172 4 */<br />
unsigned int full:1; /* 176: 0 4 */<br />
unsigned int chunk_alloc:1; /* 176: 1 4 */<br />
unsigned int flush:1; /* 176: 2 4 */<br />
...<br />
<br />
Therefore, to be safe from parallel read-modify-writes losing a write to<br />
one of the bitfield members protected by a lock, all writes to all the<br />
bitfields must use the lock. They almost universally do, except for<br />
btrfs_clear_space_info_full() which iterates over the space_infos and<br />
writes out found->full = 0 without a lock.<br />
<br />
Imagine that we have one thread completing a transaction in which we<br />
finished deleting a block_group and are thus calling<br />
btrfs_clear_space_info_full() while simultaneously the data reclaim<br />
ticket infrastructure is running do_async_reclaim_data_space():<br />
<br />
T1 T2<br />
btrfs_commit_transaction<br />
btrfs_clear_space_info_full<br />
data_sinfo->full = 0<br />
READ: full:0, chunk_alloc:0, flush:1<br />
do_async_reclaim_data_space(data_sinfo)<br />
spin_lock(&space_info->lock);<br />
if(list_empty(tickets))<br />
space_info->flush = 0;<br />
READ: full: 0, chunk_alloc:0, flush:1<br />
MOD/WRITE: full: 0, chunk_alloc:0, flush:0<br />
spin_unlock(&space_info->lock);<br />
return;<br />
MOD/WRITE: full:0, chunk_alloc:0, flush:1<br />
<br />
and now data_sinfo->flush is 1 but the reclaim worker has exited. This<br />
breaks the invariant that flush is 0 iff there is no work queued or<br />
running. Once this invariant is violated, future allocations that go<br />
into __reserve_bytes() will add tickets to space_info->tickets but will<br />
see space_info->flush is set to 1 and not queue the work. After this,<br />
they will block forever on the resulting ticket, as it is now impossible<br />
to kick the worker again.<br />
<br />
I also confirmed by looking at the assembly of the affected kernel that<br />
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,<br />
the assembly is:<br />
andb $0xfb,0x60(%rbx)<br />
and similarly for setting the full (1st) bit to 0:<br />
andb $0xfe,-0x20(%rax)<br />
<br />
So I think this is really a bug on practical systems. I have observed<br />
a number of systems in this exact state, but am currently unable to<br />
reproduce it.<br />
<br />
Rather than leaving this footgun lying around for the future, take<br />
advantage of the fact that there is room in the struct anyway, and that<br />
it is already quite large and simply change the three bitfield members to<br />
bools. This avoids writes to space_info->full having any effect on<br />
---truncated---
Severity CVSS v4.0: Pending analysis
Last modification:
29/12/2025