CVE-2025-38472
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
28/07/2025
Last modified:
22/12/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry<br />
<br />
A crash in conntrack was reported while trying to unlink the conntrack<br />
entry from the hash bucket list:<br />
[exception RIP: __nf_ct_delete_from_lists+172]<br />
[..]<br />
#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]<br />
#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]<br />
#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]<br />
[..]<br />
<br />
The nf_conn struct is marked as allocated from slab but appears to be in<br />
a partially initialised state:<br />
<br />
ct hlist pointer is garbage; looks like the ct hash value<br />
(hence crash).<br />
ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected<br />
ct->timeout is 30000 (=30s), which is unexpected.<br />
<br />
Everything else looks like normal udp conntrack entry. If we ignore<br />
ct->status and pretend its 0, the entry matches those that are newly<br />
allocated but not yet inserted into the hash:<br />
- ct hlist pointers are overloaded and store/cache the raw tuple hash<br />
- ct->timeout matches the relative time expected for a new udp flow<br />
rather than the absolute &#39;jiffies&#39; value.<br />
<br />
If it were not for the presence of IPS_CONFIRMED,<br />
__nf_conntrack_find_get() would have skipped the entry.<br />
<br />
Theory is that we did hit following race:<br />
<br />
cpu x cpu y cpu z<br />
found entry E found entry E<br />
E is expired <br />
nf_ct_delete()<br />
return E to rcu slab<br />
init_conntrack<br />
E is re-inited,<br />
ct->status set to 0<br />
reply tuplehash hnnode.pprev<br />
stores hash value.<br />
<br />
cpu y found E right before it was deleted on cpu x.<br />
E is now re-inited on cpu z. cpu y was preempted before<br />
checking for expiry and/or confirm bit.<br />
<br />
->refcnt set to 1<br />
E now owned by skb<br />
->timeout set to 30000<br />
<br />
If cpu y were to resume now, it would observe E as<br />
expired but would skip E due to missing CONFIRMED bit.<br />
<br />
nf_conntrack_confirm gets called<br />
sets: ct->status |= CONFIRMED<br />
This is wrong: E is not yet added<br />
to hashtable.<br />
<br />
cpu y resumes, it observes E as expired but CONFIRMED:<br />
<br />
nf_ct_expired()<br />
-> yes (ct->timeout is 30s)<br />
confirmed bit set.<br />
<br />
cpu y will try to delete E from the hashtable:<br />
nf_ct_delete() -> set DYING bit<br />
__nf_ct_delete_from_lists<br />
<br />
Even this scenario doesn&#39;t guarantee a crash:<br />
cpu z still holds the table bucket lock(s) so y blocks:<br />
<br />
wait for spinlock held by z<br />
<br />
CONFIRMED is set but there is no<br />
guarantee ct will be added to hash:<br />
"chaintoolong" or "clash resolution"<br />
logic both skip the insert step.<br />
reply hnnode.pprev still stores the<br />
hash value.<br />
<br />
unlocks spinlock<br />
return NF_DROP<br />
<br />
<br />
In case CPU z does insert the entry into the hashtable, cpu y will unlink<br />
E again right away but no crash occurs.<br />
<br />
Without &#39;cpu y&#39; race, &#39;garbage&#39; hlist is of no consequence:<br />
ct refcnt remains at 1, eventually skb will be free&#39;d and E gets<br />
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.<br />
<br />
To resolve this, move the IPS_CONFIRMED assignment after the table<br />
insertion but before the unlock.<br />
<br />
Pablo points out that the confirm-bit-store could be reordered to happen<br />
before hlist add resp. the timeout fixup, so switch to set_bit and<br />
before_atomic memory barrier to prevent this.<br />
<br />
It doesn&#39;t matter if other CPUs can observe a newly inserted entry right<br />
before the CONFIRMED bit was set:<br />
<br />
Such event cannot be distinguished from above "E is the old incarnation"<br />
case: the entry will be skipped.<br />
<br />
Also change nf_ct_should_gc() to first check the confirmed bit.<br />
<br />
The gc sequence is:<br />
1. Check if entry has expired, if not skip to next entry<br />
2. Obtain a reference to the expired entry.<br />
3. Call nf_ct_should_gc() to double-check step 1.<br />
<br />
nf_ct_should_gc() is thus called only for entries that already failed an<br />
expiry check. After this patch, once the confirmed bit check pas<br />
---truncated---
Impact
Base Score 3.x
5.50
Severity 3.x
MEDIUM
Vulnerable products and versions
| CPE | From | Up to |
|---|---|---|
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 5.18.13 (including) | 6.1.147 (excluding) |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.2 (including) | 6.6.100 (excluding) |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.7 (including) | 6.12.40 (excluding) |
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 6.13 (including) | 6.15.8 (excluding) |
| cpe:2.3:o:linux:linux_kernel:6.16:rc1:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.16:rc2:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.16:rc3:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.16:rc4:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.16:rc5:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:6.16:rc6:*:*:*:*:*:* | ||
| cpe:2.3:o:debian:debian_linux:11.0:*:*:*:*:*:*:* |
To consult the complete list of CPE names with products and versions, see this page
References to Advisories, Solutions, and Tools
- https://git.kernel.org/stable/c/2d72afb340657f03f7261e9243b44457a9228ac7
- https://git.kernel.org/stable/c/76179961c423cd698080b5e4d5583cf7f4fcdde9
- https://git.kernel.org/stable/c/938ce0e8422d3793fe30df2ed0e37f6bc0598379
- https://git.kernel.org/stable/c/a47ef874189d47f934d0809ae738886307c0ea22
- https://git.kernel.org/stable/c/fc38c249c622ff5e3011b8845fd49dbfd9289afc
- https://lists.debian.org/debian-lts-announce/2025/10/msg00008.html



