CVE-2024-35970
Publication date:
20/05/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
af_unix: Clear stale u->oob_skb.<br />
<br />
syzkaller started to report deadlock of unix_gc_lock after commit<br />
4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but<br />
it just uncovers the bug that has been there since commit 314001f0bf92<br />
("af_unix: Add OOB support").<br />
<br />
The repro basically does the following.<br />
<br />
from socket import *<br />
from array import array<br />
<br />
c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)<br />
c1.sendmsg([b&#39;a&#39;], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)<br />
c2.recv(1) # blocked as no normal data in recv queue<br />
<br />
c2.close() # done async and unblock recv()<br />
c1.close() # done async and trigger GC<br />
<br />
A socket sends its file descriptor to itself as OOB data and tries to<br />
receive normal data, but finally recv() fails due to async close().<br />
<br />
The problem here is wrong handling of OOB skb in manage_oob(). When<br />
recvmsg() is called without MSG_OOB, manage_oob() is called to check<br />
if the peeked skb is OOB skb. In such a case, manage_oob() pops it<br />
out of the receive queue but does not clear unix_sock(sk)->oob_skb.<br />
This is wrong in terms of uAPI.<br />
<br />
Let&#39;s say we send "hello" with MSG_OOB, and "world" without MSG_OOB.<br />
The &#39;o&#39; is handled as OOB data. When recv() is called twice without<br />
MSG_OOB, the OOB data should be lost.<br />
<br />
>>> from socket import *<br />
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)<br />
>>> c1.send(b&#39;hello&#39;, MSG_OOB) # &#39;o&#39; is OOB data<br />
5<br />
>>> c1.send(b&#39;world&#39;)<br />
5<br />
>>> c2.recv(5) # OOB data is not received<br />
b&#39;hell&#39;<br />
>>> c2.recv(5) # OOB date is skipped<br />
b&#39;world&#39;<br />
>>> c2.recv(5, MSG_OOB) # This should return an error<br />
b&#39;o&#39;<br />
<br />
In the same situation, TCP actually returns -EINVAL for the last<br />
recv().<br />
<br />
Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set<br />
EPOLLPRI even though the data has passed through by previous recv().<br />
<br />
To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing<br />
it from recv queue.<br />
<br />
The reason why the old GC did not trigger the deadlock is because the<br />
old GC relied on the receive queue to detect the loop.<br />
<br />
When it is triggered, the socket with OOB data is marked as GC candidate<br />
because file refcount == inflight count (1). However, after traversing<br />
all inflight sockets, the socket still has a positive inflight count (1),<br />
thus the socket is excluded from candidates. Then, the old GC lose the<br />
chance to garbage-collect the socket.<br />
<br />
With the old GC, the repro continues to create true garbage that will<br />
never be freed nor detected by kmemleak as it&#39;s linked to the global<br />
inflight list. That&#39;s why we couldn&#39;t even notice the issue.
Severity CVSS v4.0: Pending analysis
Last modification:
04/04/2025