CVE-2025-38472

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
28/07/2025
Last modified:
22/12/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> netfilter: nf_conntrack: fix crash due to removal of uninitialised entry<br /> <br /> A crash in conntrack was reported while trying to unlink the conntrack<br /> entry from the hash bucket list:<br /> [exception RIP: __nf_ct_delete_from_lists+172]<br /> [..]<br /> #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]<br /> #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]<br /> #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]<br /> [..]<br /> <br /> The nf_conn struct is marked as allocated from slab but appears to be in<br /> a partially initialised state:<br /> <br /> ct hlist pointer is garbage; looks like the ct hash value<br /> (hence crash).<br /> ct-&gt;status is equal to IPS_CONFIRMED|IPS_DYING, which is expected<br /> ct-&gt;timeout is 30000 (=30s), which is unexpected.<br /> <br /> Everything else looks like normal udp conntrack entry. If we ignore<br /> ct-&gt;status and pretend its 0, the entry matches those that are newly<br /> allocated but not yet inserted into the hash:<br /> - ct hlist pointers are overloaded and store/cache the raw tuple hash<br /> - ct-&gt;timeout matches the relative time expected for a new udp flow<br /> rather than the absolute &amp;#39;jiffies&amp;#39; value.<br /> <br /> If it were not for the presence of IPS_CONFIRMED,<br /> __nf_conntrack_find_get() would have skipped the entry.<br /> <br /> Theory is that we did hit following race:<br /> <br /> cpu x cpu y cpu z<br /> found entry E found entry E<br /> E is expired <br /> nf_ct_delete()<br /> return E to rcu slab<br /> init_conntrack<br /> E is re-inited,<br /> ct-&gt;status set to 0<br /> reply tuplehash hnnode.pprev<br /> stores hash value.<br /> <br /> cpu y found E right before it was deleted on cpu x.<br /> E is now re-inited on cpu z. cpu y was preempted before<br /> checking for expiry and/or confirm bit.<br /> <br /> -&gt;refcnt set to 1<br /> E now owned by skb<br /> -&gt;timeout set to 30000<br /> <br /> If cpu y were to resume now, it would observe E as<br /> expired but would skip E due to missing CONFIRMED bit.<br /> <br /> nf_conntrack_confirm gets called<br /> sets: ct-&gt;status |= CONFIRMED<br /> This is wrong: E is not yet added<br /> to hashtable.<br /> <br /> cpu y resumes, it observes E as expired but CONFIRMED:<br /> <br /> nf_ct_expired()<br /> -&gt; yes (ct-&gt;timeout is 30s)<br /> confirmed bit set.<br /> <br /> cpu y will try to delete E from the hashtable:<br /> nf_ct_delete() -&gt; set DYING bit<br /> __nf_ct_delete_from_lists<br /> <br /> Even this scenario doesn&amp;#39;t guarantee a crash:<br /> cpu z still holds the table bucket lock(s) so y blocks:<br /> <br /> wait for spinlock held by z<br /> <br /> CONFIRMED is set but there is no<br /> guarantee ct will be added to hash:<br /> "chaintoolong" or "clash resolution"<br /> logic both skip the insert step.<br /> reply hnnode.pprev still stores the<br /> hash value.<br /> <br /> unlocks spinlock<br /> return NF_DROP<br /> <br /> <br /> In case CPU z does insert the entry into the hashtable, cpu y will unlink<br /> E again right away but no crash occurs.<br /> <br /> Without &amp;#39;cpu y&amp;#39; race, &amp;#39;garbage&amp;#39; hlist is of no consequence:<br /> ct refcnt remains at 1, eventually skb will be free&amp;#39;d and E gets<br /> destroyed via: nf_conntrack_put -&gt; nf_conntrack_destroy -&gt; nf_ct_destroy.<br /> <br /> To resolve this, move the IPS_CONFIRMED assignment after the table<br /> insertion but before the unlock.<br /> <br /> Pablo points out that the confirm-bit-store could be reordered to happen<br /> before hlist add resp. the timeout fixup, so switch to set_bit and<br /> before_atomic memory barrier to prevent this.<br /> <br /> It doesn&amp;#39;t matter if other CPUs can observe a newly inserted entry right<br /> before the CONFIRMED bit was set:<br /> <br /> Such event cannot be distinguished from above "E is the old incarnation"<br /> case: the entry will be skipped.<br /> <br /> Also change nf_ct_should_gc() to first check the confirmed bit.<br /> <br /> The gc sequence is:<br /> 1. Check if entry has expired, if not skip to next entry<br /> 2. Obtain a reference to the expired entry.<br /> 3. Call nf_ct_should_gc() to double-check step 1.<br /> <br /> nf_ct_should_gc() is thus called only for entries that already failed an<br /> expiry check. After this patch, once the confirmed bit check pas<br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.18.13 (including) 6.1.147 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.2 (including) 6.6.100 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.7 (including) 6.12.40 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.13 (including) 6.15.8 (excluding)
cpe:2.3:o:linux:linux_kernel:6.16:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.16:rc2:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.16:rc3:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.16:rc4:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.16:rc5:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.16:rc6:*:*:*:*:*:*
cpe:2.3:o:debian:debian_linux:11.0:*:*:*:*:*:*:*