CVE-2024-41009
Publication date:
17/07/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
bpf: Fix overrunning reservations in ringbuf<br />
<br />
The BPF ring buffer internally is implemented as a power-of-2 sized circular<br />
buffer, with two logical and ever-increasing counters: consumer_pos is the<br />
consumer counter to show which logical position the consumer consumed the<br />
data, and producer_pos which is the producer counter denoting the amount of<br />
data reserved by all producers.<br />
<br />
Each time a record is reserved, the producer that "owns" the record will<br />
successfully advance producer counter. In user space each time a record is<br />
read, the consumer of the data advanced the consumer counter once it finished<br />
processing. Both counters are stored in separate pages so that from user<br />
space, the producer counter is read-only and the consumer counter is read-write.<br />
<br />
One aspect that simplifies and thus speeds up the implementation of both<br />
producers and consumers is how the data area is mapped twice contiguously<br />
back-to-back in the virtual memory, allowing to not take any special measures<br />
for samples that have to wrap around at the end of the circular buffer data<br />
area, because the next page after the last data page would be first data page<br />
again, and thus the sample will still appear completely contiguous in virtual<br />
memory.<br />
<br />
Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for<br />
book-keeping the length and offset, and is inaccessible to the BPF program.<br />
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`<br />
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however<br />
possible to make a second allocated memory chunk overlapping with the first<br />
chunk and as a result, the BPF program is now able to edit first chunk&#39;s<br />
header.<br />
<br />
For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size<br />
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to<br />
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in<br />
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets<br />
allocate a chunk B with size 0x3000. This will succeed because consumer_pos<br />
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`<br />
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able<br />
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned<br />
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data<br />
pages. This means that chunk B at [0x4000,0x4008] is chunk A&#39;s header.<br />
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header&#39;s pg_off to then<br />
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk<br />
B modified chunk A&#39;s header, then bpf_ringbuf_commit() refers to the wrong<br />
page and could cause a crash.<br />
<br />
Fix it by calculating the oldest pending_pos and check whether the range<br />
from the oldest outstanding record to the newest would span beyond the ring<br />
buffer size. If that is the case, then reject the request. We&#39;ve tested with<br />
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)<br />
before/after the fix and while it seems a bit slower on some benchmarks, it<br />
is still not significantly enough to matter.
Severity CVSS v4.0: Pending analysis
Last modification:
03/11/2025