CVE-2025-38166
Publication date:
03/07/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
bpf: fix ktls panic with sockmap<br />
<br />
[ 2172.936997] ------------[ cut here ]------------<br />
[ 2172.936999] kernel BUG at lib/iov_iter.c:629!<br />
......<br />
[ 2172.944996] PKRU: 55555554<br />
[ 2172.945155] Call Trace:<br />
[ 2172.945299] <br />
[ 2172.945428] ? die+0x36/0x90<br />
[ 2172.945601] ? do_trap+0xdd/0x100<br />
[ 2172.945795] ? iov_iter_revert+0x178/0x180<br />
[ 2172.946031] ? iov_iter_revert+0x178/0x180<br />
[ 2172.946267] ? do_error_trap+0x7d/0x110<br />
[ 2172.946499] ? iov_iter_revert+0x178/0x180<br />
[ 2172.946736] ? exc_invalid_op+0x50/0x70<br />
[ 2172.946961] ? iov_iter_revert+0x178/0x180<br />
[ 2172.947197] ? asm_exc_invalid_op+0x1a/0x20<br />
[ 2172.947446] ? iov_iter_revert+0x178/0x180<br />
[ 2172.947683] ? iov_iter_revert+0x5c/0x180<br />
[ 2172.947913] tls_sw_sendmsg_locked.isra.0+0x794/0x840<br />
[ 2172.948206] tls_sw_sendmsg+0x52/0x80<br />
[ 2172.948420] ? inet_sendmsg+0x1f/0x70<br />
[ 2172.948634] __sys_sendto+0x1cd/0x200<br />
[ 2172.948848] ? find_held_lock+0x2b/0x80<br />
[ 2172.949072] ? syscall_trace_enter+0x140/0x270<br />
[ 2172.949330] ? __lock_release.isra.0+0x5e/0x170<br />
[ 2172.949595] ? find_held_lock+0x2b/0x80<br />
[ 2172.949817] ? syscall_trace_enter+0x140/0x270<br />
[ 2172.950211] ? lockdep_hardirqs_on_prepare+0xda/0x190<br />
[ 2172.950632] ? ktime_get_coarse_real_ts64+0xc2/0xd0<br />
[ 2172.951036] __x64_sys_sendto+0x24/0x30<br />
[ 2172.951382] do_syscall_64+0x90/0x170<br />
......<br />
<br />
After calling bpf_exec_tx_verdict(), the size of msg_pl->sg may increase,<br />
e.g., when the BPF program executes bpf_msg_push_data().<br />
<br />
If the BPF program sets cork_bytes and sg.size is smaller than cork_bytes,<br />
it will return -ENOSPC and attempt to roll back to the non-zero copy<br />
logic. However, during rollback, msg->msg_iter is reset, but since<br />
msg_pl->sg.size has been increased, subsequent executions will exceed the<br />
actual size of msg_iter.<br />
&#39;&#39;&#39;<br />
iov_iter_revert(&msg->msg_iter, msg_pl->sg.size - orig_size);<br />
&#39;&#39;&#39;<br />
<br />
The changes in this commit are based on the following considerations:<br />
<br />
1. When cork_bytes is set, rolling back to non-zero copy logic is<br />
pointless and can directly go to zero-copy logic.<br />
<br />
2. We can not calculate the correct number of bytes to revert msg_iter.<br />
<br />
Assume the original data is "abcdefgh" (8 bytes), and after 3 pushes<br />
by the BPF program, it becomes 11-byte data: "abc?de?fgh?".<br />
Then, we set cork_bytes to 6, which means the first 6 bytes have been<br />
processed, and the remaining 5 bytes "?fgh?" will be cached until the<br />
length meets the cork_bytes requirement.<br />
<br />
However, some data in "?fgh?" is not within &#39;sg->msg_iter&#39;<br />
(but in msg_pl instead), especially the data "?" we pushed.<br />
<br />
So it doesn&#39;t seem as simple as just reverting through an offset of<br />
msg_iter.<br />
<br />
3. For non-TLS sockets in tcp_bpf_sendmsg, when a "cork" situation occurs,<br />
the user-space send() doesn&#39;t return an error, and the returned length is<br />
the same as the input length parameter, even if some data is cached.<br />
<br />
Additionally, I saw that the current non-zero-copy logic for handling<br />
corking is written as:<br />
&#39;&#39;&#39;<br />
line 1177<br />
else if (ret != -EAGAIN) {<br />
if (ret == -ENOSPC)<br />
ret = 0;<br />
goto send_end;<br />
&#39;&#39;&#39;<br />
<br />
So it&#39;s ok to just return &#39;copied&#39; without error when a "cork" situation<br />
occurs.
Severity CVSS v4.0: Pending analysis
Last modification:
03/07/2025