CVE-2024-49998
Publication date:
21/10/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
net: dsa: improve shutdown sequence<br />
<br />
Alexander Sverdlin presents 2 problems during shutdown with the<br />
lan9303 driver. One is specific to lan9303 and the other just happens<br />
to reproduce there.<br />
<br />
The first problem is that lan9303 is unique among DSA drivers in that it<br />
calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,<br />
not remove):<br />
<br />
phy_state_machine()<br />
-> ...<br />
-> dsa_user_phy_read()<br />
-> ds->ops->phy_read()<br />
-> lan9303_phy_read()<br />
-> chip->ops->phy_read()<br />
-> lan9303_mdio_phy_read()<br />
-> dev_get_drvdata()<br />
<br />
But we never stop the phy_state_machine(), so it may continue to run<br />
after dsa_switch_shutdown(). Our common pattern in all DSA drivers is<br />
to set drvdata to NULL to suppress the remove() method that may come<br />
afterwards. But in this case it will result in an NPD.<br />
<br />
The second problem is that the way in which we set<br />
dp->conduit->dsa_ptr = NULL; is concurrent with receive packet<br />
processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,<br />
but afterwards, rather than continuing to use that non-NULL value,<br />
dev->dsa_ptr is dereferenced again and again without NULL checks:<br />
dsa_conduit_find_user() and many other places. In between dereferences,<br />
there is no locking to ensure that what was valid once continues to be<br />
valid.<br />
<br />
Both problems have the common aspect that closing the conduit interface<br />
solves them.<br />
<br />
In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN<br />
event in dsa_user_netdevice_event() which closes user ports as well.<br />
dsa_port_disable_rt() calls phylink_stop(), which synchronously stops<br />
the phylink state machine, and ds->ops->phy_read() will thus no longer<br />
call into the driver after this point.<br />
<br />
In the second case, dev_close(conduit) should do this, as per<br />
Documentation/networking/driver.rst:<br />
<br />
| Quiescence<br />
| ----------<br />
|<br />
| After the ndo_stop routine has been called, the hardware must<br />
| not receive or transmit any data. All in flight packets must<br />
| be aborted. If necessary, poll or wait for completion of<br />
| any reset commands.<br />
<br />
So it should be sufficient to ensure that later, when we zeroize<br />
conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call<br />
on this conduit.<br />
<br />
The addition of the netif_device_detach() function is to ensure that<br />
ioctls, rtnetlinks and ethtool requests on the user ports no longer<br />
propagate down to the driver - we&#39;re no longer prepared to handle them.<br />
<br />
The race condition actually did not exist when commit 0650bf52b31f<br />
("net: dsa: be compatible with masters which unregister on shutdown")<br />
first introduced dsa_switch_shutdown(). It was created later, when we<br />
stopped unregistering the user interfaces from a bad spot, and we just<br />
replaced that sequence with a racy zeroization of conduit->dsa_ptr<br />
(one which doesn&#39;t ensure that the interfaces aren&#39;t up).
Severity CVSS v4.0: Pending analysis
Last modification:
24/11/2025