CVE-2022-49394
Publication date:
26/02/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
blk-iolatency: Fix inflight count imbalances and IO hangs on offline<br />
<br />
iolatency needs to track the number of inflight IOs per cgroup. As this<br />
tracking can be expensive, it is disabled when no cgroup has iolatency<br />
configured for the device. To ensure that the inflight counters stay<br />
balanced, iolatency_set_limit() freezes the request_queue while manipulating<br />
the enabled counter, which ensures that no IO is in flight and thus all<br />
counters are zero.<br />
<br />
Unfortunately, iolatency_set_limit() isn&#39;t the only place where the enabled<br />
counter is manipulated. iolatency_pd_offline() can also dec the counter and<br />
trigger disabling. As this disabling happens without freezing the q, this<br />
can easily happen while some IOs are in flight and thus leak the counts.<br />
<br />
This can be easily demonstrated by turning on iolatency on an one empty<br />
cgroup while IOs are in flight in other cgroups and then removing the<br />
cgroup. Note that iolatency shouldn&#39;t have been enabled elsewhere in the<br />
system to ensure that removing the cgroup disables iolatency for the whole<br />
device.<br />
<br />
The following keeps flipping on and off iolatency on sda:<br />
<br />
echo +io > /sys/fs/cgroup/cgroup.subtree_control<br />
while true; do<br />
mkdir -p /sys/fs/cgroup/test<br />
echo &#39;8:0 target=100000&#39; > /sys/fs/cgroup/test/io.latency<br />
sleep 1<br />
rmdir /sys/fs/cgroup/test<br />
sleep 1<br />
done<br />
<br />
and there&#39;s concurrent fio generating direct rand reads:<br />
<br />
fio --name test --filename=/dev/sda --direct=1 --rw=randread \<br />
--runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k<br />
<br />
while monitoring with the following drgn script:<br />
<br />
while True:<br />
for css in css_for_each_descendant_pre(prog[&#39;blkcg_root&#39;].css.address_of_()):<br />
for pos in hlist_for_each(container_of(css, &#39;struct blkcg&#39;, &#39;css&#39;).blkg_list):<br />
blkg = container_of(pos, &#39;struct blkcg_gq&#39;, &#39;blkcg_node&#39;)<br />
pd = blkg.pd[prog[&#39;blkcg_policy_iolatency&#39;].plid]<br />
if pd.value_() == 0:<br />
continue<br />
iolat = container_of(pd, &#39;struct iolatency_grp&#39;, &#39;pd&#39;)<br />
inflight = iolat.rq_wait.inflight.counter.value_()<br />
if inflight:<br />
print(f&#39;inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} &#39;<br />
f&#39;{cgroup_path(css.cgroup).decode("utf-8")}&#39;)<br />
time.sleep(1)<br />
<br />
The monitoring output looks like the following:<br />
<br />
inflight=1 sda /user.slice<br />
inflight=1 sda /user.slice<br />
...<br />
inflight=14 sda /user.slice<br />
inflight=13 sda /user.slice<br />
inflight=17 sda /user.slice<br />
inflight=15 sda /user.slice<br />
inflight=18 sda /user.slice<br />
inflight=17 sda /user.slice<br />
inflight=20 sda /user.slice<br />
inflight=19 sda /user.slice
Severity CVSS v4.0: Pending analysis
Last modification:
26/02/2025