CVE-2026-23157
Fecha de publicación:
14/02/2026
*** Pendiente de traducción *** In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
btrfs: do not strictly require dirty metadata threshold for metadata writepages<br />
<br />
[BUG]<br />
There is an internal report that over 1000 processes are<br />
waiting at the io_schedule_timeout() of balance_dirty_pages(), causing<br />
a system hang and trigger a kernel coredump.<br />
<br />
The kernel is v6.4 kernel based, but the root problem still applies to<br />
any upstream kernel before v6.18.<br />
<br />
[CAUSE]<br />
From Jan Kara for his wisdom on the dirty page balance behavior first.<br />
<br />
This cgroup dirty limit was what was actually playing the role here<br />
because the cgroup had only a small amount of memory and so the dirty<br />
limit for it was something like 16MB.<br />
<br />
Dirty throttling is responsible for enforcing that nobody can dirty<br />
(significantly) more dirty memory than there&#39;s dirty limit. Thus when<br />
a task is dirtying pages it periodically enters into balance_dirty_pages()<br />
and we let it sleep there to slow down the dirtying.<br />
<br />
When the system is over dirty limit already (either globally or within<br />
a cgroup of the running task), we will not let the task exit from<br />
balance_dirty_pages() until the number of dirty pages drops below the<br />
limit.<br />
<br />
So in this particular case, as I already mentioned, there was a cgroup<br />
with relatively small amount of memory and as a result with dirty limit<br />
set at 16MB. A task from that cgroup has dirtied about 28MB worth of<br />
pages in btrfs btree inode and these were practically the only dirty<br />
pages in that cgroup.<br />
<br />
So that means the only way to reduce the dirty pages of that cgroup is<br />
to writeback the dirty pages of btrfs btree inode, and only after that<br />
those processes can exit balance_dirty_pages().<br />
<br />
Now back to the btrfs part, btree_writepages() is responsible for<br />
writing back dirty btree inode pages.<br />
<br />
The problem here is, there is a btrfs internal threshold that if the<br />
btree inode&#39;s dirty bytes are below the 32M threshold, it will not<br />
do any writeback.<br />
<br />
This behavior is to batch as much metadata as possible so we won&#39;t write<br />
back those tree blocks and then later re-COW them again for another<br />
modification.<br />
<br />
This internal 32MiB is higher than the existing dirty page size (28MiB),<br />
meaning no writeback will happen, causing a deadlock between btrfs and<br />
cgroup:<br />
<br />
- Btrfs doesn&#39;t want to write back btree inode until more dirty pages<br />
<br />
- Cgroup/MM doesn&#39;t want more dirty pages for btrfs btree inode<br />
Thus any process touching that btree inode is put into sleep until<br />
the number of dirty pages is reduced.<br />
<br />
Thanks Jan Kara a lot for the analysis of the root cause.<br />
<br />
[ENHANCEMENT]<br />
Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the<br />
btree_inode"), btrfs btree inode pages will only be charged to the root<br />
cgroup which should have a much larger limit than btrfs&#39; 32MiB<br />
threshold.<br />
So it should not affect newer kernels.<br />
<br />
But for all current LTS kernels, they are all affected by this problem,<br />
and backporting the whole AS_KERNEL_FILE may not be a good idea.<br />
<br />
Even for newer kernels I still think it&#39;s a good idea to get<br />
rid of the internal threshold at btree_writepages(), since for most cases<br />
cgroup/MM has a better view of full system memory usage than btrfs&#39; fixed<br />
threshold.<br />
<br />
For internal callers using btrfs_btree_balance_dirty() since that<br />
function is already doing internal threshold check, we don&#39;t need to<br />
bother them.<br />
<br />
But for external callers of btree_writepages(), just respect their<br />
requests and write back whatever they want, ignoring the internal<br />
btrfs threshold to avoid such deadlock on btree inode dirty page<br />
balancing.
Gravedad: Pendiente de análisis
Última modificación:
14/02/2026