CVE-2025-37931

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
20/05/2025
Last modified:
19/12/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> btrfs: adjust subpage bit start based on sectorsize<br /> <br /> When running machines with 64k page size and a 16k nodesize we started<br /> seeing tree log corruption in production. This turned out to be because<br /> we were not writing out dirty blocks sometimes, so this in fact affects<br /> all metadata writes.<br /> <br /> When writing out a subpage EB we scan the subpage bitmap for a dirty<br /> range. If the range isn&amp;#39;t dirty we do<br /> <br /> bit_start++;<br /> <br /> to move onto the next bit. The problem is the bitmap is based on the<br /> number of sectors that an EB has. So in this case, we have a 64k<br /> pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4<br /> bits for every node. With a 64k page size we end up with 4 nodes per<br /> page.<br /> <br /> To make this easier this is how everything looks<br /> <br /> [0 16k 32k 48k ] logical address<br /> [0 4 8 12 ] radix tree offset<br /> [ 64k page ] folio<br /> [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers<br /> [ | | | | | | | | | | | | | | | | ] bitmap<br /> <br /> Now we use all of our addressing based on fs_info-&gt;sectorsize_bits, so<br /> as you can see the above our 16k eb-&gt;start turns into radix entry 4.<br /> <br /> When we find a dirty range for our eb, we correctly do bit_start +=<br /> sectors_per_node, because if we start at bit 0, the next bit for the<br /> next eb is 4, to correspond to eb-&gt;start 16k.<br /> <br /> However if our range is clean, we will do bit_start++, which will now<br /> put us offset from our radix tree entries.<br /> <br /> In our case, assume that the first time we check the bitmap the block is<br /> not dirty, we increment bit_start so now it == 1, and then we loop<br /> around and check again. This time it is dirty, and we go to find that<br /> start using the following equation<br /> <br /> start = folio_start + bit_start * fs_info-&gt;sectorsize;<br /> <br /> so in the case above, eb-&gt;start 0 is now dirty, and we calculate start<br /> as<br /> <br /> 0 + 1 * fs_info-&gt;sectorsize = 4096<br /> 4096 &gt;&gt; 12 = 1<br /> <br /> Now we&amp;#39;re looking up the radix tree for 1, and we won&amp;#39;t find an eb.<br /> What&amp;#39;s worse is now we&amp;#39;re using bit_start == 1, so we do bit_start +=<br /> sectors_per_node, which is now 5. If that eb is dirty we will run into<br /> the same thing, we will look at an offset that is not populated in the<br /> radix tree, and now we&amp;#39;re skipping the writeout of dirty extent buffers.<br /> <br /> The best fix for this is to not use sectorsize_bits to address nodes,<br /> but that&amp;#39;s a larger change. Since this is a fs corruption problem fix<br /> it simply by always using sectors_per_node to increment the start bit.

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.13 (including) 6.1.151 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.2 (including) 6.6.105 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.7 (including) 6.12.28 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.13 (including) 6.14.6 (excluding)
cpe:2.3:o:linux:linux_kernel:6.15:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.15:rc2:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.15:rc3:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.15:rc4:*:*:*:*:*:*
cpe:2.3:o:debian:debian_linux:11.0:*:*:*:*:*:*:*