CVE-2026-23157

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
14/02/2026
Last modified:
18/02/2026

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> btrfs: do not strictly require dirty metadata threshold for metadata writepages<br /> <br /> [BUG]<br /> There is an internal report that over 1000 processes are<br /> waiting at the io_schedule_timeout() of balance_dirty_pages(), causing<br /> a system hang and trigger a kernel coredump.<br /> <br /> The kernel is v6.4 kernel based, but the root problem still applies to<br /> any upstream kernel before v6.18.<br /> <br /> [CAUSE]<br /> From Jan Kara for his wisdom on the dirty page balance behavior first.<br /> <br /> This cgroup dirty limit was what was actually playing the role here<br /> because the cgroup had only a small amount of memory and so the dirty<br /> limit for it was something like 16MB.<br /> <br /> Dirty throttling is responsible for enforcing that nobody can dirty<br /> (significantly) more dirty memory than there&amp;#39;s dirty limit. Thus when<br /> a task is dirtying pages it periodically enters into balance_dirty_pages()<br /> and we let it sleep there to slow down the dirtying.<br /> <br /> When the system is over dirty limit already (either globally or within<br /> a cgroup of the running task), we will not let the task exit from<br /> balance_dirty_pages() until the number of dirty pages drops below the<br /> limit.<br /> <br /> So in this particular case, as I already mentioned, there was a cgroup<br /> with relatively small amount of memory and as a result with dirty limit<br /> set at 16MB. A task from that cgroup has dirtied about 28MB worth of<br /> pages in btrfs btree inode and these were practically the only dirty<br /> pages in that cgroup.<br /> <br /> So that means the only way to reduce the dirty pages of that cgroup is<br /> to writeback the dirty pages of btrfs btree inode, and only after that<br /> those processes can exit balance_dirty_pages().<br /> <br /> Now back to the btrfs part, btree_writepages() is responsible for<br /> writing back dirty btree inode pages.<br /> <br /> The problem here is, there is a btrfs internal threshold that if the<br /> btree inode&amp;#39;s dirty bytes are below the 32M threshold, it will not<br /> do any writeback.<br /> <br /> This behavior is to batch as much metadata as possible so we won&amp;#39;t write<br /> back those tree blocks and then later re-COW them again for another<br /> modification.<br /> <br /> This internal 32MiB is higher than the existing dirty page size (28MiB),<br /> meaning no writeback will happen, causing a deadlock between btrfs and<br /> cgroup:<br /> <br /> - Btrfs doesn&amp;#39;t want to write back btree inode until more dirty pages<br /> <br /> - Cgroup/MM doesn&amp;#39;t want more dirty pages for btrfs btree inode<br /> Thus any process touching that btree inode is put into sleep until<br /> the number of dirty pages is reduced.<br /> <br /> Thanks Jan Kara a lot for the analysis of the root cause.<br /> <br /> [ENHANCEMENT]<br /> Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the<br /> btree_inode"), btrfs btree inode pages will only be charged to the root<br /> cgroup which should have a much larger limit than btrfs&amp;#39; 32MiB<br /> threshold.<br /> So it should not affect newer kernels.<br /> <br /> But for all current LTS kernels, they are all affected by this problem,<br /> and backporting the whole AS_KERNEL_FILE may not be a good idea.<br /> <br /> Even for newer kernels I still think it&amp;#39;s a good idea to get<br /> rid of the internal threshold at btree_writepages(), since for most cases<br /> cgroup/MM has a better view of full system memory usage than btrfs&amp;#39; fixed<br /> threshold.<br /> <br /> For internal callers using btrfs_btree_balance_dirty() since that<br /> function is already doing internal threshold check, we don&amp;#39;t need to<br /> bother them.<br /> <br /> But for external callers of btree_writepages(), just respect their<br /> requests and write back whatever they want, ignoring the internal<br /> btrfs threshold to avoid such deadlock on btree inode dirty page<br /> balancing.

Impact