CVE-2024-27415
Publication date:
17/05/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
netfilter: bridge: confirm multicast packets before passing them up the stack<br />
<br />
conntrack nf_confirm logic cannot handle cloned skbs referencing<br />
the same nf_conn entry, which will happen for multicast (broadcast)<br />
frames on bridges.<br />
<br />
Example:<br />
macvlan0<br />
|<br />
br0<br />
/ \<br />
ethX ethY<br />
<br />
ethX (or Y) receives a L2 multicast or broadcast packet containing<br />
an IP packet, flow is not yet in conntrack table.<br />
<br />
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.<br />
-> skb->_nfct now references a unconfirmed entry<br />
2. skb is broad/mcast packet. bridge now passes clones out on each bridge<br />
interface.<br />
3. skb gets passed up the stack.<br />
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb<br />
and schedules a work queue to send them out on the lower devices.<br />
<br />
The clone skb->_nfct is not a copy, it is the same entry as the<br />
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.<br />
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.<br />
<br />
The Macvlan broadcast worker and normal confirm path will race.<br />
<br />
This race will not happen if step 2 already confirmed a clone. In that<br />
case later steps perform skb_clone() with skb->_nfct already confirmed (in<br />
hash table). This works fine.<br />
<br />
But such confirmation won&#39;t happen when eb/ip/nftables rules dropped the<br />
packets before they reached the nf_confirm step in postrouting.<br />
<br />
Pablo points out that nf_conntrack_bridge doesn&#39;t allow use of stateful<br />
nat, so we can safely discard the nf_conn entry and let inet call<br />
conntrack again.<br />
<br />
This doesn&#39;t work for bridge netfilter: skb could have a nat<br />
transformation. Also bridge nf prevents re-invocation of inet prerouting<br />
via &#39;sabotage_in&#39; hook.<br />
<br />
Work around this problem by explicit confirmation of the entry at LOCAL_IN<br />
time, before upper layer has a chance to clone the unconfirmed entry.<br />
<br />
The downside is that this disables NAT and conntrack helpers.<br />
<br />
Alternative fix would be to add locking to all code parts that deal with<br />
unconfirmed packets, but even if that could be done in a sane way this<br />
opens up other problems, for example:<br />
<br />
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4<br />
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5<br />
<br />
For multicast case, only one of such conflicting mappings will be<br />
created, conntrack only handles 1:1 NAT mappings.<br />
<br />
Users should set create a setup that explicitly marks such traffic<br />
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass<br />
them, ruleset might have accept rules for untracked traffic already,<br />
so user-visible behaviour would change.
Severity CVSS v4.0: Pending analysis
Last modification:
17/05/2024