CVE-2022-48644
Publication date:
28/04/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
net/sched: taprio: avoid disabling offload when it was never enabled<br />
<br />
In an incredibly strange API design decision, qdisc->destroy() gets<br />
called even if qdisc->init() never succeeded, not exclusively since<br />
commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),<br />
but apparently also earlier (in the case of qdisc_create_dflt()).<br />
<br />
The taprio qdisc does not fully acknowledge this when it attempts full<br />
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in<br />
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS<br />
parsed from netlink (in taprio_change(), tail called from taprio_init()).<br />
<br />
But in taprio_destroy(), we call taprio_disable_offload(), and this<br />
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).<br />
<br />
But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()<br />
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro<br />
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set<br />
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on<br />
an invalid set of flags.<br />
<br />
As a result, it is possible to crash the kernel if user space forces an<br />
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling<br />
of taprio_enable_offload(). This is because drivers do not expect the<br />
offload to be disabled when it was never enabled.<br />
<br />
The error that we force here is to attach taprio as a non-root qdisc,<br />
but instead as child of an mqprio root qdisc:<br />
<br />
$ tc qdisc add dev swp0 root handle 1: \<br />
mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \<br />
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0<br />
$ tc qdisc replace dev swp0 parent 1:1 \<br />
taprio num_tc 8 map 0 1 2 3 4 5 6 7 \<br />
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \<br />
sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \<br />
flags 0x0 clockid CLOCK_TAI<br />
Unable to handle kernel paging request at virtual address fffffffffffffff8<br />
[fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000<br />
Internal error: Oops: 96000004 [#1] PREEMPT SMP<br />
Call trace:<br />
taprio_dump+0x27c/0x310<br />
vsc9959_port_setup_tc+0x1f4/0x460<br />
felix_port_setup_tc+0x24/0x3c<br />
dsa_slave_setup_tc+0x54/0x27c<br />
taprio_disable_offload.isra.0+0x58/0xe0<br />
taprio_destroy+0x80/0x104<br />
qdisc_create+0x240/0x470<br />
tc_modify_qdisc+0x1fc/0x6b0<br />
rtnetlink_rcv_msg+0x12c/0x390<br />
netlink_rcv_skb+0x5c/0x130<br />
rtnetlink_rcv+0x1c/0x2c<br />
<br />
Fix this by keeping track of the operations we made, and undo the<br />
offload only if we actually did it.<br />
<br />
I&#39;ve added "bool offloaded" inside a 4 byte hole between "int clockid"<br />
and "atomic64_t picos_per_byte". Now the first cache line looks like<br />
below:<br />
<br />
$ pahole -C taprio_sched net/sched/sch_taprio.o<br />
struct taprio_sched {<br />
struct Qdisc * * qdiscs; /* 0 8 */<br />
struct Qdisc * root; /* 8 8 */<br />
u32 flags; /* 16 4 */<br />
enum tk_offsets tk_offset; /* 20 4 */<br />
int clockid; /* 24 4 */<br />
bool offloaded; /* 28 1 */<br />
<br />
/* XXX 3 bytes hole, try to pack */<br />
<br />
atomic64_t picos_per_byte; /* 32 0 */<br />
<br />
/* XXX 8 bytes hole, try to pack */<br />
<br />
spinlock_t current_entry_lock; /* 40 0 */<br />
<br />
/* XXX 8 bytes hole, try to pack */<br />
<br />
struct sched_entry * current_entry; /* 48 8 */<br />
struct sched_gate_list * oper_sched; /* 56 8 */<br />
/* --- cacheline 1 boundary (64 bytes) --- */
Severity CVSS v4.0: Pending analysis
Last modification:
19/09/2025