CVE-2024-26837
Publication date:
17/04/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
net: bridge: switchdev: Skip MDB replays of deferred events on offload<br />
<br />
Before this change, generation of the list of MDB events to replay<br />
would race against the creation of new group memberships, either from<br />
the IGMP/MLD snooping logic or from user configuration.<br />
<br />
While new memberships are immediately visible to walkers of<br />
br->mdb_list, the notification of their existence to switchdev event<br />
subscribers is deferred until a later point in time. So if a replay<br />
list was generated during a time that overlapped with such a window,<br />
it would also contain a replay of the not-yet-delivered event.<br />
<br />
The driver would thus receive two copies of what the bridge internally<br />
considered to be one single event. On destruction of the bridge, only<br />
a single membership deletion event was therefore sent. As a<br />
consequence of this, drivers which reference count memberships (at<br />
least DSA), would be left with orphan groups in their hardware<br />
database when the bridge was destroyed.<br />
<br />
This is only an issue when replaying additions. While deletion events<br />
may still be pending on the deferred queue, they will already have<br />
been removed from br->mdb_list, so no duplicates can be generated in<br />
that scenario.<br />
<br />
To a user this meant that old group memberships, from a bridge in<br />
which a port was previously attached, could be reanimated (in<br />
hardware) when the port joined a new bridge, without the new bridge&#39;s<br />
knowledge.<br />
<br />
For example, on an mv88e6xxx system, create a snooping bridge and<br />
immediately add a port to it:<br />
<br />
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \<br />
> ip link set dev x3 up master br0<br />
<br />
And then destroy the bridge:<br />
<br />
root@infix-06-0b-00:~$ ip link del dev br0<br />
root@infix-06-0b-00:~$ mvls atu<br />
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a<br />
DEV:0 Marvell 88E6393X<br />
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .<br />
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .<br />
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a<br />
root@infix-06-0b-00:~$<br />
<br />
The two IPv6 groups remain in the hardware database because the<br />
port (x3) is notified of the host&#39;s membership twice: once via the<br />
original event and once via a replay. Since only a single delete<br />
notification is sent, the count remains at 1 when the bridge is<br />
destroyed.<br />
<br />
Then add the same port (or another port belonging to the same hardware<br />
domain) to a new bridge, this time with snooping disabled:<br />
<br />
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \<br />
> ip link set dev x3 up master br1<br />
<br />
All multicast, including the two IPv6 groups from br0, should now be<br />
flooded, according to the policy of br1. But instead the old<br />
memberships are still active in the hardware database, causing the<br />
switch to only forward traffic to those groups towards the CPU (port<br />
0).<br />
<br />
Eliminate the race in two steps:<br />
<br />
1. Grab the write-side lock of the MDB while generating the replay<br />
list.<br />
<br />
This prevents new memberships from showing up while we are generating<br />
the replay list. But it leaves the scenario in which a deferred event<br />
was already generated, but not delivered, before we grabbed the<br />
lock. Therefore:<br />
<br />
2. Make sure that no deferred version of a replay event is already<br />
enqueued to the switchdev deferred queue, before adding it to the<br />
replay list, when replaying additions.
Severity CVSS v4.0: Pending analysis
Last modification:
02/04/2025